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Preface 


Multivariate statistical analysis often proves to be a challenging subject for students. The 
difficulty arises in part from the reliance on several types of symbols such as subscripts, 
superscripts, bars, tildes, bold-face characters, lower- and uppercase Roman and Greek 
letters, and so on. However, resorting to such notations is necessary in order to refer to 
the various quantities involved such as scalars and matrices either in the real or complex 
domain. When the first author began to teach courses in advanced mathematical statistics 
and multivariate analysis at McGill University, Canada, and other academic institutions 
around the world, he was seeking means of making the study of multivariate analysis 
more accessible and enjoyable. He determined that the subject could be made simpler 
by treating mathematical and random variables alike, thus avoiding the distinct notation 
that is generally utilized to represent random and non-random quantities. Accordingly, all 
scalar variables, whether mathematical or random, are denoted by lowercase letters and 
all vector/matrix variables are denoted by capital letters, with vectors and matrices being 
identically denoted since vectors can be viewed as matrices having a single row or column. 
As well, variables belonging to the complex domain are readily identified as such by plac- 
ing a tilde over the corresponding lowercase and capital letters. Moreover, he noticed that 
numerous formulas expressed in terms of summations, subscripts, and superscripts could 
be more efficiently represented by appealing to matrix methods. He further observed that 
the study of multivariate analysis could be simplified by initially delivering a few lec- 
tures on Jacobians of matrix transformations and elementary special functions of matrix 
argument, and by subsequently deriving the statistical density functions as special cases of 
these elementary functions as is done for instance in the present book for the real and com- 
plex matrix-variate gamma and beta density functions. Basic notes in these directions were 
prepared and utilized by the first author for his lectures over the past decades. The second 
and third authors then joined him and added their contributions to flesh out this material 
to full-fledged book form. Many of the notable features that distinguish this monograph 
from other books on the subject are listed next. 


vi Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Special Features 


1. As the title of the book suggests, its most distinctive feature is its development of a 
parallel theory of multivariate analysis in the complex domain side by side with the cor- 
responding treatment of the real cases. Various quantities involving complex random vari- 
ables such as Hermitian forms are widely used in many areas of applications such as light 
scattering, quantum physics, and communication theory, to name a few. A wide reader- 
ship is expected as, to our knowledge, this is the first book in the area that systematically 
combines in a single source the real results and their complex counterparts. Students will 
be able to better grasp the results that are holding in the complex field by relating them to 
those existing in the real field. 


2. In order to avoid resorting to an excessive number of symbols to denote scalar, vector, 
and matrix variables in the real and complex domains, the following consistent notations 
are employed throughout the book: All real scalar variables, whether mathematical or ran- 
dom, are denoted by lowercase letters and all real vector/matrix variables are denoted by 
capital letters, a tilde being placed on the corresponding variables in the complex domain. 


3. Mathematical variables and random variables are treated the same way and denoted 
by the same type of letters in order to avoid the double notation often utilized to rep- 
resent random and mathematical variables as well as the potentially resulting confusion. 
If probabilities are to be attached to every value that a variable takes, then mathematical 
variables can be construed as degenerate random variables. This simplified notation will 
enable students from mathematics, physics, and other disciplines to easily understand the 
subject matter without being perplexed. Although statistics students may initially find this 
notation somewhat unsettling, the adjustment ought to prove rapid. 


4. Matrix methods are utilized throughout the book so as to limit the number of summa- 
tions, subscripts, superscripts, and so on. This makes the representations of the various 
results simpler and elegant. 


5. A connection is established between statistical distribution theory of scalar, vector, and 
matrix variables in the real and complex domains and fractional calculus. This should 
foster further growth in both of these fields, which may borrow results and techniques 
from each other. 


6. Connections of concepts encountered in multivariate analysis to concepts occurring in 
geometrical probabilities are pointed out so that each area can be enriched by further work 
in the other one. Geometrical probability problems of random lengths, random areas, and 
random volumes in the complex domain may not have been developed yet. They may now 
be tackled by making use of the results presented in this book. 
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7. Classroom lecture style is employed as this book's writing style so that the reader has 
the impression of listening to a lecture upon reading the material. 


8. The central concepts and major results are followed by illustrative worked examples so 
that students may easily comprehend the meaning and significance of the stated results. 
Additional problems are provided as exercises for the students to work out so that the 
remaining questions they still may have can be clarified. 


9. Throughout the book, the majority of the derivations of known or original results are 
innovative and rather straightforward as they rest on simple applications of results from 
matrix algebra, vector/matrix derivatives, and elementary special functions. 


10. Useful results on vector/matrix differential operators are included in the mathematical 
preliminaries for the real case, and the corresponding operators in the complex domain are 
developed in Chap. 3. They are utilized to derive maximum likelihood estimators of vec- 
tor/matrix parameters in the real and complex domains in a more straightforward manner 
than is otherwise the case with the usual lengthy procedures. The vector/matrix differential 
operators in the complex domain may actually be new whereas their counterparts, the real, 
case may be found in Mathai (1997) [see Chapter 1, reference list]. 


11. The simplified and consistent notation of dX is used to denote the wedge product 
of the differentials of all functionally independent real scalar variables in X, whether X 
is a scalar, a vector, or a square or rectangular matrix, with dX being utilized for the 
corresponding wedge product of differentials in the complex domain. 


12. Equation numbering is done sequentially chapter/section-wise; for example, (3.5.4) 
indicates the fourth equation appearing in Sect. 5 of Chap. 3. To make the numbering 
scheme more concise and descriptive, the section titles, lemmas, theorems, exercises, and 
equations pertaining to the complex domain will be identified by appending the letter ‘a’ 
to the respective section numbers such as (3.5a.4). The notation (i), (ii), ..., is employed 
for neighboring equation numbers related to a given derivation. 


13. References to the previous materials or equation numbers as well as references to sub- 
sequent results appearing in the book are kept a minimum. In order to enhance readability, 
the main notations utilized in each chapter are repeated at the beginning of each one of 
them. As well, the reader may notice certain redundancies in the statements. These are 
intentional and meant to make the material easier to follow. 


14. Due to the presence of numerous parameters, students generally find the subject of 
factor analysis quite difficult to grasp and apply effectively. Their understanding of the 
topic should be significantly enhanced by the explicit derivations that are provided, which 
incidentally are believed to be original. 
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15. Only the basic material in each topic is covered. The subject matter is clearly dis- 
cussed and several worked examples are provided so that the students can acquire a clear 
understanding of this primary material. Only the materials used in each chapter are given 
as reference—mostly the authors’ own works. Additional reading materials are listed at 
the very end of the book. After acquainting themselves with the introductory material pre- 
sented in each chapter, the readers ought to be capable of mastering more advanced related 
topics on their own. 


Multivariate analysis encompasses a vast array of topics. Even if the very primary ma- 
terials pertaining to most of these topics were included in a basic book such as the present 
one, the length of the resulting monograph would be excessive. Hence, certain topics had 
to be chosen in order to produce a manuscript of a manageable size. The selection of the 
topics to be included or excluded is authors’ own choice, and it is by no means claimed 
that those included in the book are the most important ones or that those being omitted 
are not relevant. Certain pertinent topics, such as confidence regions, multiple confidence 
intervals, multivariate scaling, tests based on arbitrary statistics, and logistic and ridge re- 
gressions, are omitted so as to limit the size of the book. For instance, only some likelihood 
ratio statistics or A-criteria based tests on normal populations are treated in Chap. 6 on tests 
of hypotheses, whereas the authors could have discussed various tests of hypotheses on pa- 
rameters associated with the exponential, multinomial, or other populations, as they also 
have worked on such problems. As well, since results related to elliptically contoured dis- 
tributions including the spherically symmetric case might be of somewhat limited interest, 
this topic is not pursued further subsequently to its introduction in Chap. 3. Nevertheless, 
standard applications such as principal component analysis, canonical correlation analysis, 
factor analysis, classification problems, multivariate analysis of variance, profile analysis, 
growth curves, cluster analysis, and correspondence analysis are properly covered. 


Tables of percentage points are provided for the normal, chisquare, Student-r, апа F 
distributions as well as for the null distributions of the statistics for testing the indepen- 
dence and for testing the equality of the diagonal elements given that the population co- 
variance matrix is diagonal, as they are frequently required in applied areas. Numerical 
tables for other relevant tests encountered in multivariate analysis are readily available in 
the literature. 


This work may be used as a reference book or as a textbook for a full course on mul- 
tivariate analysis. Potential readership includes mathematicians, statisticians, physicists, 
engineers, as well as researchers and graduate students in related fields. Chapters 1—8 or 
sections thereof could be covered in a one- to two-semester course on mathematical statis- 
tics or multivariate analysis, while a full course on applied multivariate analysis might 
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focus on Chaps. 9—15. Readers with little interest in complex analysis may omit the sec- 
tions whose numbers are followed by an ‘a’ without any loss of continuity. With this book 
and its numerous new derivations, those who are already familiar with multivariate anal- 
ysis in the real domain will have an opportunity to further their knowledge of the subject 
and to delve into the complex counterparts of the results. 


The authors wish to thank the following former students of the Centre for Mathemat- 
ical and Statistical Sciences, India, for making use of a preliminary draft of portions of 
the book for their courses and communicating their comments: Dr. T. Princy, Cochin Uni- 
versity of Science and Technology, Kochi, Kerala, India; Dr. Nicy Sebastian, St. Thomas 
College, Calicut University, Thrissur, Kerala, India; and Dr. Dilip Kumar, Kerala Univer- 
sity, Trivandrum, India. The authors also wish to express their thanks to Dr. C. Satheesh 
Kumar, Professor of Statistics, University of Kerala, and Dr. Joby K. Jose, Professor of 
Statistics, Kannur University, for their pertinent comments on the second drafts of the 
chapters. The authors have no conflict of interest to declare. The second author would like 
to acknowledge the financial support of the Natural Sciences and Engineering Research 
Council of Canada. 
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Chapter 1 ^ 
п п п п | 
Mathematical Preliminaries Cheek for 


1.1. Introduction 


It is assumed that the reader has had adequate exposure to basic concepts in Probability, 
Statistics, Calculus and Linear Algebra. This chapter provides a brief review of the results 
that will be needed in the remainder of this book. No detailed discussion of these topics 
will be attempted. For essential materials in these areas, the reader is, for instance, referred 
to Mathai and Haubold (2017a, 2017b). Some properties of vectors, matrices, determi- 
nants, Jacobians and wedge product of differentials to be utilized later on, are included in 
the present chapter. For the sake of completeness, we initially provide some elementary 
definitions. First, the concepts of vectors, matrices and determinants are introduced. 

Consider the consumption profile of a family in terms of the quantities of certain food 
items consumed every week. The following table gives this family’s consumption profile 
for three weeks: 


Table 1.1: Consumption profile 


Rice Lentils Carrot Beans 
Week 1 2.00 0.50 1.00 2.00 
Week 2 1.50 0.50 0.75 1.50 
Week 3 2.00 0.50 0.50 1.25 


All the numbers appearing in this table are in kilograms (kg). In Week 1 the family 
consumed 2 kg of rice, 0.5 kg of lentils, 1 kg of carrots and 2 kg of beans. Looking at 
the consumption over three weeks, we have an arrangement of 12 numbers into 3 rows 
and 4 columns. If this consumption profile is expressed in symbols, we have the following 
representation: 
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а11 412 413 ад 2.00 0.50 1.00 2.00 
А = (а;;) = | d1 422 423 024 | = 1.50 0.50 0.75 1.50 
аз] 432 433 434 2.00 0.50 0.50 1.25 


where, for example, аџ = 2.00, a13 = 1.00, a?» = 0.50, a23 = 0.75, a32 = 0.50, a34 = 
1.25. 


Definition 1.1.1. А matrix An arrangement of mn items into т rows and n columns is 
called an m by n (written as m x n) matrix. 


Accordingly, the above consumption profile matrix is 3 x 4 (3 by 4), that is, it has 
3 rows and 4 columns. The standard notation consists in enclosing the mn items within 
round ( ) or square [ ] brackets as in the above representation. The above 3 x 4 matrix is 
represented in different ways as A, (а;;) and items enclosed by square brackets. The mn 
items in the m x n matrix are called elements of the matrix. Then, in the above matrix 
A, aij = the i-th row, j-th column element or the (i,j)-th element. In the above illustration, 
і = 1,2,3 (3 rows) and j = 1,2,3,4 (4 columns). A general m x n matrix A can be 
written as follows: 


ап 412 din 
а21 а22 ... An 

А= |: mM T (1.1.1) 
m1 Am2 eee Amn 


The elements are separated by spaces in order to avoid any confusion. Should there be 
any possibility of confusion, then the elements will be separated by commas. Note that the 
plural of “matrix” is “matrices”. Observe that the position of each element in Table 1.1 has 
a meaning. The elements cannot be permuted as rearranged elements will give different 
matrices. In other words, two m x n matrices A = (а;;) and В = (bij) are equal if and 
only if aj; = bij for all i and j, that is, they must be element-wise equal. 

In Table 1.1, the first row, which is also a 1 x 4 matrix, represents this family's first 
week's consumption. The fourth column represents the consumption of beans over the 
three weeks' period. Thus, each row and each column in an m x n matrix has a meaning 
and represents different aspects. In Eq. (1.1.1), all rows are 1 x n matrices and all columns 
are m х 1 matrices. A 1 x n matrix is called а row vector and an m x 1 matrix is called a 
column vector. For example, in Table 1.1, there are 3 row vectors and 4 column vectors. If 
the row vectors are denoted by А, R2, R3 and the column vectors by С, C2, Сз, C4, then 
we have 


Ку = [2.00 0.50 1.00 2.00], А = [1.50 0.50 0.75 1.50], R3 = [2.00 0.50 0.50 1.25] 
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2.00 0.50 1.00 2.00 
Cy = | 1.50 |, C2 = | 0.50 | , C3 = | 0.75 |, C4 | 1.50 
2.00 0.50 0.50 1.25 


If the total consumption in Week 1 and Week 2 is needed, it is obtained by adding the row 
vectors element-wise: 


Ку + Кә = [2.00 + 1.50 0.50 +0.50 1.00 +0.75 2.00 + 1.50] = [3.50 1.00 1.75 3.50]. 


We will define the addition of two matrices in the same fashion as in the above illustration. 
For the addition to hold, both matrices must be of the same order m x n. Let A = (aij) 
and B = (b;;) be two m x n matrices. Then the sum, denoted by A + B, is defined as 


A+ B= (aij + bij) 


or equivalently as the matrix obtained by adding the corresponding elements. For example, 


2.00 1.00 3.00 
Cı + C3 = | 1.50 | + | 0.75 | = | 2.25 
2.00 0.50 2.50 
Repeating the addition, we have 
2.00 3.00 2.00 5.00 
Cı + Сз + СА = (Cı + C3) + | 1.50 | = | 2.25 | + | 1.50 | = | 3.75 
1.25 2.50 1.25 3.75 


In general, if A = (aij), B = (bij), C = (cij), D = (dij) are m x n matrices, then 
A+B+C +D = (aij + bij + cij + dij), that is, it is the matrix obtained by adding the 
corresponding elements. 

Suppose that in Table 1.1, we wish to express the elements in terms of grams instead 
of kilograms; then, each and every element therein must be multiplied by 1000. Thus, if A 
is the matrix corresponding to Table 1.1 and B is the matrix in terms of grams, we have 


2.00 0.50 1.00 2.00 
А = | 1.50 0.50 0.75 1.50], 
2.00 0.50 0.50 1.25 


1000 х 2.00 1000 х 0.50 1000 х 1.00 1000 х 2.00 
В = | 1000 х 1.50 1000 х 0.50 1000 х 0.75 1000 х 1.50 
1000 х 2.00 1000 х 0.50 1000 х 0.50 1000 х 1.25 


4 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


We may write this symbolically as В = 1000 x A = 1000 A. Note that 1000 isa 1 x 1 
matrix or a scalar quantity. Any 1 x 1 matrix is called a scalar quantity. Then we may 
define scalar multiplication of a matrix A by the scalar quantity c as c A or A = (aij) > 
c A = (са;;) or it is obtained by multiplying each and every element of A by the scalar 
quantity c. As a convention, c is written on the left of A as c A and not as A c. Then, if 
c = —l, then c A = (—1)A = —A and A+ (—1)A = A — A = О where the capital О 
denotes a matrix whose elements are all equal to zero. A general m x n matrix wherein 
every element is zero is referred to as a null matrix and it is written as O (not zero). We 
may also note that if A, B, C are m x n matrices, then А + (В + C) = (А + B) + C. 
Moreover, A + O = O + A = A. If m = n, in which case the number of rows is equal 
to the number of columns, the resulting matrix is referred to as a square matrix because it 
is a Square arrangement of elements; otherwise the matrix is called a rectangular matrix. 
Some special cases of square matrices are the following: For ann x n matrix or a square 
matrix of order n, suppose that a;; = О for alli Æ j (that is, all non-diagonal elements are 
zeros; here “diagonal” means the diagonal going from top left to bottom right) and if there 
is at least one nonzero diagonal element, then such a matrix is called a diagonal matrix 
and it is usually written as diag(di, ..., dn) where dj,..., d, are the diagonal elements. 
Here are some examples of 3 x 3 diagonal matrices: 


5 0 0 4 0 0 a 0 0 
Dj—-|0-2 0|, р= {0 1 0|, DBD=)|0 a 0},aF0. 
0 O 7 0 0 0 0 0a 


If in D3, a = 1 so that all the diagonal elements are unities, the resulting matrix is called 
an identity matrix and a diagonal matrix whose diagonal elements are all equal to some 
number a that is not equal to O or 1, is referred to as a scalar matrix. A square non-null 
matrix А = (а;;) that contains at least one nonzero element below its leading diagonal 
and whose elements above the leading diagonal are all equal to zero, that is, a;; — O for 
all i < j, is called à lower triangular matrix. Some examples of 2 x 2 lower triangular 
matrices are the following: 


If, in a square non-null matrix, all elements below the leading diagonal are zeros and there 
is at least one nonzero element above the leading diagonal, then such a square matrix is 
referred to as an upper triangular matrix. Here are some examples: 
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1 2 =! 1 0 0 00 7 
Tj, —|0 3 1|,752|00 4|,732|00 0 
0 0 0 0 0 0 0 0 


Multiplication of Matrices Once again, consider Table 1.1. Suppose that by consuming 
1 kg of rice, the family is getting 700 g (where g represents grams) of starch, 2 g protein 
and 1 g fat; that by eating 1 kg of lentils, the family is getting 200 g of starch, 100g of 
protein and 100g of fat; that by consuming 1 kg of carrots, the family is getting 100g 
of starch, 200 g of protein and 150 g of fat; and that by eating 1 kg of beans, the family 
is getting 50g of starch, 100 g of protein and 200 g of fat, respectively. Then the starch- 
protein-fat matrix, denoted by B, is the following where the rows correspond to rice, lentil, 
carrots and beans, respectively: 


700 2 1 

200 100 100 
100 200 150 
50 100 200 


Let Ві, B2, Вз be the columns of B. Then, the first column B, of B represents the starch 
intake per kg of rice, lentil, carrots and beans respectively. Similarly, the second column 
B» represents the protein intake per kg and the third column B3 represents the fat intake, 


that 1s, 
El | » | | | | 
200 100 100 
ч ‚ B2— | уу}, Вз= | 150] - 


50 100 | 200 


Let the rows of the matrix A in Table 1.1 be denoted by A1, A» and Аз, respectively, so 
that 


A; = [2.00 0.50 1.00 2.00], A2 = [1.50 0.50 0.75 1.50], Аз = [2.00 0.50 0.50 1.25]. 
Then, the total intake of starch by the family in Week 1 is available from 
2.00 x 700 + 0.50 x 200 + 1.00 x 100 + 2.00 x 50 = 1700g. 


This is the sum of the element-wise products of A; with B1. We will denote this by A, · B1 
(Ау dot Bj). The total intake of protein by the family in Week 1 is determined as follows: 


А.В = 2.00 x 2 + 0.50 x 100 + 1.00 x 200 + 2.00 x 100 = 454 g 
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and the total intake of fat in Week 1 is given by 
A,.B3 = 2.00 x 1 + 0.50 x 100 + 1.00 x 150 + 2.00 x 200 = 602 g. 


Thus, the dot product of A, with B1, B2, Вз provides the intake of starch, protein and fat 
in Week 1. Similarly, the dot product of A» with В|, Вә, B3 gives the intake of starch, 
protein and fat in Week 2. Thus, the configuration of starch, protein and fat intake over the 


three weeks is 

Ai: By Ау: By А: Вз 

АВ = |А: Ву Az: Bo A2- Вз 

A3- Bı Аз: Вә Аз: Вз 
A matrix having one column апа т rows is ап т х 1 matrix that is referred to as a column 
vector of m elements or a column vector of order m. A matrix having one row and n 
column is a 1 x n matrix called a row vector of n components or a row vector of order п. 
Let A be a row or column vector of order n, which consist of n elements or components. 
Let the elements comprising A be denoted by a1, ..., an. Let B be a row or column vector 
of order n consisting of the elements b1, ..., bn. Then, the dot product of A and B, denoted 
by A-B = B.A is defined as A- B = a,b, +a2b2 +- - · а,Ь, so that or the corresponding 
elements of A and B are multiplied and added up. Let A be an m x n matrix whose m rows 
are written as A1,..., Am. Let B be another n x r matrix whose r columns are written as 
Ві,..., B,. Note that the number of columns of A is equal to the number of rows of B, 
which in this case is n. When the number of columns of A is equal to the number of rows 
of B, the product AB is defined and equal to 


Ay By A-B ... А.В, A 
Ast By АБ ... A-B, 
AB = | 


A2 
wihA=| |, B=[B, B» --- ВД, 
Am: By Аһ · Bo ... Am- В, Am 


the resulting matrix AB being of order m x r. When AB is defined, BA need not be 
defined. However, if r = m, then BA is also defined, otherwise not. In other words, if 
A= (aij) is m x n and if B — (bij) isn x r and if C = (cij) = AB, then Cjj = Aj * Bj 
where А; is the i-th row of A and В; is the j-th column of B or cj; = PR ME aik bx; for all 
i and j. For example, 


2 —2 


a=|, E: Jr 3 215 
1 0 


(DO) + (—1)(3) + 0000 (D(-2 + (—1)(2) + ud Е E E 


йе i Ga 2-2) +32) + (50) lis 2b 
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Note that, in this case, BA is defined and equal to 
2 —2 
BA=|3 2 Ё E d 
1 0 
E + (—2)(2) Q(-D-cTC2Q9) Q0 + 20 | 


3)0) + (2) (2) OCD 3WO+ MG) 
DDOD (00C7-D + (0)(38) (10) + (0) (5) 


—2 —8 -10 
=| 7 3 10/. 
1 -1 0 
As another example, let 


1 1 21, 3 
А=|—1|,В=[23 5 > АВ=|—1|[235]=|—2 —3 —5 
2 2 4 6 10 


which is 3 x 3, whereas BA is 1 x 1: 


1 
BA =[2 35]| -1| =9. 
2 


As yet another example, let 


Note that here both A and B are lower triangular matrices. The products AB and BA are 
defined since both A and В are 3 x 3 matrices. For instance, 


CDA)  0)0) + Od) (-)O)+ M2) + (0)(-1) (10) + (0)(0) + (0)(0) 
00) c (0)0) +00) (0000 E CODO) E (ODC7D.— dO + 0)(0) + dO) 


200 
=|-1 2 0|. 
2 1 0 


| Q)() + (000) + (00) — Q)(0) + (000) + (0)(—1) (2)O)+ (0)(0) + (0)(0) | 
AB = 
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Observe that since A and B are lower triangular, A B is also lower triangular. Here are some 
general properties of products of matrices: When the product of the matrices is defined, in 
which case we say that they are conformable for multiplication, 


(1): the product of two lower triangular matrices is lower triangular; 

(2): the product of two upper triangular matrices is upper triangular; 

(3): the product of two diagonal matrices is diagonal; 

(4): if A is m x n, [A = A where І = lI, is an identity matrix, and А I, = A; 

(5): OA — O whenever OA is defined, and A O — O whenever A O is defined. 
Transpose of a Matrix A matrix whose rows are the corresponding columns of A or, equiv- 


alently, a matrix whose columns are the corresponding rows of A is called the transpose 
of A denoted as A' (A prime). For example, 


1 
Ai-[12 —-1]2 А = | 2], А = ee = А, = poe ; 
1 2 
1 1 1 0 1 


1 3 1 3 0 5 0 —5 
^ - |; == JE та Е adel | = ^s 


Observe that A2 is lower triangular and А» is upper triangular, that A? = Аз, and that 
A’, = —A4. Note that if A is m x n, then A’ is n x m. If A is 1 x 1 then A’ is the same 
scalar (1 x 1) quantity. If A is a square matrix and A' — A, then A is called a symmetric 
matrix. If B is a square matrix and B' = — B, then B is called a skew symmetric matrix. 
Within a skew symmetric matrix В = (bij), a diagonal element must satisfy the equation 
b. j= —bjj, which necessitates that Бу; = 0, whether В be real or complex. Here are some 
properties of the transpose: The transpose of a lower triangular matrix is upper triangular; 
the transpose of an upper triangular matrix is lower triangular; the transpose of a diagonal 
matrix is diagonal; the transpose of an m x n null matrix is an n x m null matrix; 


(А) = A; (АВ) = B'A’; (A1 A2 «++ Ар) = Ay: A5 Ау; (А + В) = А' + В' 


whenever AB, A+ B, and A, A» --- A, are defined. 


Trace of a Square Matrix The trace is defined only for square matrices. Let A — (aij) 
be an n x n matrix whose leading diagonal elements аге a11, 422, ..., апп; then the trace 
of A, denoted by tr(A) is defined as tr(A) = аџ + a22 +--+ + ann, that is, the sum of 
the elements comprising the leading diagonal. The following properties can directly be 
deduced from the definition. Whenever AB and BA are defined, tr(AB) = tr(BA) where 
A B need not be equal to BA. If A is m x n and B is n x m, then AB is m х m whereas 
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BA is n x n; however, the traces are equal, that is, tr(AB) = tr(BA), which implies that 
tr(ABC) = tr(BCA) = tr(CAD). 


Length of a Vector Let V be an x 1 real column vector or a 1 x п real row vector then V 
can be represented as a point in n-dimensional Euclidean space when the elements are real 
numbers. Consider a 2-dimensional vector (or 2-vector) with the elements (1, 2). Then, 
this vector corresponds to the point depicted in Fig. 1.1. 


Figure 1.1 The point P — (1,2) in the plane 


Let O be the origin and P be the point. Then, the length of the resulting vector is the Eu- 
clidean distance between О and Р, that is, +./(1)? + (2? = +/5. Let U = (u1, ..., Un) 
be a real n-vector, either written as a row or a column. Then the length of О, denoted by 
I| U || is defined as follows: 


IUI = +үн+ + 


whenever the elements u1,...,ug are real. If и, ...,иһ are complex numbers then 
|U || = V lui? Tec |и„ |2 where |u j| denotes the absolute value or modulus of uj. If 


uj = ajtib;, withi = /(—1) апа aj, b; real, then |u j| = +, (аў + b^). If the length of 
a vector is unity, that vector is called a unit vector. For example, e1 = (1,0,...,0),e2 = 
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(0,1,0,...,0),...,e, = (0,...,0, 1) are all unit vectors. As well, Vj = (25, -5) апа 
V = (97 Ni W are unit vectors. If two n-vectors, U4 and U2, are such that U1 - U2 = 0, 
that is, their dot product is zero, then the two vectors are said to be orthogonal to each 
other. For example, if Uj; = (1,1) and U2 = (1, —1), then U1 - U2 = 0 and U; and 
U> are orthogonal to each other; similarly, if U; = (1,1,1) and U2 = (1, —2, 1), then 
U, - Uz = 0 and U, and (7 are orthogonal to each other. If U1, ..., Ug are k vectors, 
each of order n, all being either row vectors or column vectors, and if U; - U; = 0 for all 
i Æ j, that is, all distinct vectors are orthogonal to each other, then we say that U4, ... , Ux 
forms an orthogonal system of vectors. In addition, if the length of each vector is unity, 
|U;l| = 1, j = 1,..., k, then we say that Uj, ..., Ux is an orthonormal system of vectors. 
If a matrix A is real and its rows and its columns form an orthonormal system, then A is 
called an orthonormal matrix. In this case, AA’ = I, and A'A = I,; accordingly, any 
square matrix A of real elements such that AA’ = І, and A'A = I, is referred to as an 
orthonormal matrix. If only one equation holds, that is, B is a real matrix such that either 
BB’ = I, B'B Æ I or B'B = I, BB’ = І, then B is called a semiorthonormal matrix. 
For example, consider the matrix 


Ji 


and A is an orthonormal matrix. As well, 


iss, Sale 
IE: с then AA! = Db, A'A = b, 


1 1 1 
$ $5 % 

A= 4 y 48 = АА = В, АА = В, 
e uet ue 


and A here is orthonormal. However, 


ee xm 
so that B is semiorthonormal. On deleting some rows from an orthonormal matrix, we 
obtain a semiorthonormal matrix such that BB’ = J and B'B = I. Similarly, if we delete 
some of the columns, we end up with а semiorthonormal matrix such that В'В = J and 
HB #1. 


Linear Independence of Vectors Consider the vectors U} = (1,1, 1), U2 = (1, —2, 1), 
Оз = (3, 0, 3). Then, we can easily see that U3 = 20 + U2 = 2(1,1, 1) + (1, 2,1) = 
(3, 0, 3) or U3—2U, —U2 = О (a null vector). In this case, one of the vectors can be written 


PME NES 
ZI “З 0 еар ваз 
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as a linear function of the others. Let Vj = (1, 1, 1), Vo = (1,0, 21), Va = (1, —2, 1). 
Can any one of these be written as a linear function of others? If that were possible, then 
there would exist a linear function of Vi, V2, Уз that is equal to is a null vector. Let us 
consider the equation a, У + a2 V2 + a3 V3 = (0, 0, 0) where ај, a2, аз are scalars where at 
least one of them is nonzero. Note that a, = 0, az = 0 and аз = 0 will always satisfy the 
above equation. Thus, our question is whether a, = 0, аз = 0, аз = 0 is the only solution. 


a Vi + а № + аз V3 = O => а1(1, 1, 1) + ax(l, 0, ex) + а3(1, =2; 1) = (0, 0, 0) 
>a; +a + аз = 0 (i); а – 2a3 =0 (ii); ај – a + a3 = 0. (iii) 


From (ii), ај = 2a3. Then, from (iii), Заз — a2 = 0 = a» = Заз; then from (i), 2а3 + 
Заз + аз = 0 or баз = 0 or аз = О. Thus, a? = 0, a; = 0 and there is no nonzero а] 
ог аз ог аз satisfying the equation and hence Vj, V2, V3 cannot be linearly dependent; so, 
they are linearly independent. Hence, we have the following definition: Let U;,..., Ux be 
k vectors, each of order n, all being either row vectors or column vectors, so that addition 
and linear functions are defined. Let a1, ..., ag be scalar quantities. Consider the equation 


aU, + a.U2+---+a,U;, = О (a null vector). (iv) 


If a = 0,a2 = 0,..., ay = 0 is the only solution to (iv), then U1, ..., Ug are linearly 
independent, otherwise they are linearly dependent. If they are linearly dependent, then 
at least one of the vectors can be expressed as a linear function of others. The following 
properties can be established from the definition: Let U1, ..., Ug be n-vectors, k < n. 


(1) If Uj,..., Uy are mutually orthogonal, then they are linearly independent, that is, if 
Ui -U; = 0, for alli A j, then Uj, ..., Ux are linearly independent; 
(2) There cannot be more than n mutually orthogonal n-vectors; 


(3) There cannot be more than и linearly independent n-vectors. 


Rank of a Matrix The maximum number of linearly independent row vectors of a m x n 
matrix is called the row rank of the matrix; the maximum number of linearly independent 
column vectors is called the column rank of the matrix. It can be shown that the row rank 
of any matrix is equal to its column rank, and this common rank is called the rank of the 
matrix. If r is the rank of a m x n matrix, then r < m andr < n. If m < n and the rank 
is m or if n < m and the rank is n, then the matrix is called a full rank matrix. A square 
matrix of full rank is called a nonsingular matrix. When the rank of an n x n matrix is 
r « n, this matrix is referred to as a singular matrix. Singularity is defined only for square 
matrices. The following properties clearly hold: 
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(1) A diagonal matrix with at least one zero diagonal element is singular or a diagonal 
matrix with all nonzero diagonal elements is nonsingular; 


(2) A triangular matrix (upper or lower) with at least one zero diagonal element is singular 
or a triangular matrix with all diagonal elements nonzero is nonsingular; 


(3) A square matrix containing at least one null row vector or at least one null column 
vector is singular; 


(4) Linear independence or dependence in a collection of vectors of the same order and 
category (either all are row vectors or all are column vectors) is not altered by multiplying 
any of the vectors by a nonzero scalar; 


(5) Linear independence or dependence in a collection of vectors of the same order and 
category is not altered by adding any vector of the set to any other vector in the same set; 


(6) Linear independence or dependence in a collection of vectors of the same order and 
category is not altered by adding a linear combination of vectors from the same set to any 
other vector in the same Set; 


(7) If a collection of vectors of the same order and category is a linearly dependent system, 
then at least one of the vectors can be made null by the operations of scalar multiplication 
and addition. 


Note: We have defined “vectors” as an ordered set of items such as an ordered set of 
numbers. One can also give a general definition of a vector as an element in a set S which 
is closed under the operations of scalar multiplication and addition (these operations are 
to be defined on S), that is, letting S be a set of items, if Vj є Sand № є S, thencV; є S 
and Vi + № € S for all scalar c and for all Vi and V5, that is, if У is an element in 5, 
then сУ is also an element іп S and if Vj and Vz are in S, then И + Və is also in S, 
where operations c V; and Vj + № are to be properly defined. One can impose additional 
conditions on S. However, for our discussion, the notion of vectors as ordered set of items 
will be sufficient. 


1.2. Determinants 


Determinants are defined only for square matrices. They are certain scalar func- 
tions of the elements of the square matrix under consideration. We will motivate this 
particular function by means of an example that will also prove useful in other ar- 
eas. Consider two 2-vectors, either both row vectors or both column vectors. Let 
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U = OP and V = ОО be the two vectors as shown in Fig. 1.2. If the vectors are sepa- 
rated by a nonzero angle 0 then one can create the parallelogram OP SQ with these two 
vectors as shown in Fig. 1.2. 


Y 


Figure 1.2 Parallelogram generated from two vectors 


The area of the parallelogram is twice the area of the triangle O P Q. If the perpendic- 
ular from P to OQ is PR, then the area of the triangle is ІРК х ОО or the area of ће 
parallelogram OP SQ is PR x ||V|| where PR is OP x sin@ = [||| x sin@. Therefore 
the area is ||U || || V|| sin @ or the area, denoted by v is 


v = [UI VIV G — cos? 6). 


If Өү is the angle U makes with the x-axis and 62, the angle V makes with the x-axis, then 
if U and V are as depicted in Fig. 1.2, then 0 — 0; — 05. It follows that 


U-V 


cos Ө = cos(6; — 05) = cos Өү cos 0» + sin 0] sin 05 = ——— ——, 
IU T IVI 


as can be seen from Fig. 1.2. In this case, 


С.ү) \2 
v= ||| mj = (ОПЕР = y (UID? (VID? — (U - v»? (1.2.1) 
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and 
v? = (IU DAV D? —U- vy. (192: 


А : : В : U 
This can be written in a more convenient way. Letting X — (v) ; 


‚_ [U ryn _|UU' UV| |U-U U.V 
xx = (у) elon vv |у. v-vI: (1.2.3) 
On comparing (1.2.2) and (1.2.3), we note that (1.2.2) is available from (1.2.3) by taking 
a scalar function of the following type. Consider a matrix 


C= [ | ; then (1.2.2) is available by taking ad — bc 


where a, b, c, d are scalar quantities. A scalar function of this type is the determinant of 
the matrix C. 

A general result can be deduced from the above procedure: If U and V are n-vectors 
and if 0 is the angle between them, then 


U.V 


cos Q = ————— 
IU J| IV |] 


or the dot product of U and V divided by the product of their lengths when 0 Æ 0, and the 
numerator is equal to the denominator when 0 = 2nz, n = 0, 1, 2,.... We now provide 
a formal definition of the determinant of a square matrix. 


Definition 1.2.1. Тһе Determinant of a Square Matrix Let A = (a;;) be an x n matrix 
whose rows (columns) are denoted by o, ..., @,. For example, if œ; is the i-th row vector, 
then 

о = (dii di2 ... Gin). 


The determinant of A will be denoted by |A| or det(A) when A is real or complex and 
the absolute value of the determinant of A will be denoted by |det(A)| when A is in the 
complex domain. Then, |A| will be a function of o1, ..., Œn, written as 


|А | = det(A) = f (01,5. 005 none 0] oos æn), 


which will be defined by the following four axioms (postulates or assumptions): (this 
definition also holds if the elements of the matrix are in the complex domain) 


(1) Tibet ао а An) = Cf (о\,...,о,..., Ол), 
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which is equivalent to saying that if any row (column) is multiplied by a scalar quantity c 
(including zero), then the whole determinant is multiplied by c; 


(2) TO aUi Lt i de oO) TO Us a AO а Oy) 
j j 


which is equivalent to saying that if any row (column) is added to any other row (column), 
then the value of the determinant remains the same; 


(3) f(01 ЕО ses On) = TO s ise он) Ef s ---, 0i,- -3 æn), 


which is equivalent to saying that if any row (column), say the i-th row (column) is written 
as a sum of two vectors, aj = у; + 6; then the determinant becomes the sum of two 
determinants such that y; appears at the position of о; in the first one and ô; appears at the 
position of о; in the second one; 


(4) YU eden) El 


where e1, ..., е„ are the basic unit vectors as previously defined; this axiom states that the 
determinant of an identity matrix is 1. 


Let us consider some corollaries resulting from Axioms (1) to (4). On combining Ax- 
ioms (1) and (2), we have that the value of a determinant remains unchanged if a linear 
function of any number of rows (columns) is added to any other row (column). As well, 
the following results are direct consequences of the axioms. 


(1): The determinant of a diagonal matrix is the product of the diagonal elements [which 
can be established by repeated applications of Axiom (1)]; 


(1): If any diagonal element in a diagonal matrix is zero, then the determinant is zero, 
and thereby the corresponding matrix is singular; if none of the diagonal elements of a 
diagonal matrix is equal to zero, then the matrix is nonsingular. 


(11): Jf any row (column) of a matrix is null, then the determinant is zero or the matrix is 
singular [Axiom (1)]; 

(iv): If any row (column) is a linear function of other rows (columns), then the determinant 
is zero [By Axioms (1) and (2), we can reduce that row (column) to a null vector]. Thus, 
the determinant of a singular matrix is zero or if the row (column) vectors form a linearly 
dependent system, then the determinant is zero. 


By using Axioms (1) and (2), we can reduce a triangular matrix to a diagonal form 
when evaluating its determinant. For this purpose we shall use the following standard 
notation: “c (i) + (j) =” means *c times the i-th row is added to the j-th row which 
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results in the following:” Let us consider a simple example. Consider a triangular matrix 
and its determinant. Evaluate the determinant of the following matrix: 


2 1 5 
T=|03 4 
00 —4 


It is an upper triangular matrix. We take out —4 from the third row by using Axiom (1). 
Then, 


215 
|Т|=—4|0 3 4l. 
001 


Now, add (—4) times the third row to the second row and (—5) times the third row to the 
first row. This in symbols is *—4(3) + (2), —5(3) + (1) =”. The net result is that the 
elements 5 and 4 in the last column are eliminated without affecting the other elements, so 
that 


IT| = —4 


оо м 


1 0 
3 QJ. 
0 1 


Now take out 3 from the second row and then use the second row to eliminate 1 in the first 
row. After taking out 3 from the second row, the operation is “—1(2) + (1) =”. The result 
is the following: 
200 

IT| 2(—4)93)]0 1 OJ. 
00 1 
Now, take out 2 from the first row, then by Axiom (4) the determinant of the resulting 
identity matrix is 1, and hence |T| is nothing but the product of the diagonal elements. 
Thus, we have the following result: 


(v): The determinant of a triangular matrix (upper or lower) is the product of its diagonal 
elements; accordingly, if any diagonal element in a triangular matrix is zero, then the 
determinant is zero and the matrix is singular. For a triangular matrix to be nonsingular, 
all its diagonal elements must be non-zeros. 

The following result follows directly from Axioms (1) and (2). The proof is given in 
symbols. 
(vi): If any two rows (columns) are interchanged (this means one transposition), then the 
resulting determinant is multiplied by —1 or every transposition brings in —1 outside that 
determinant as a multiple. If an odd number of transpositions are done, then the whole 
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determinant is multiplied by —1, and for even number of transpositions, the multiplicative 
factor is +1 or no change in the determinant. An outline of the proof follows: 


|A] = f(@y,...,Qj,...,Q;j,..., An) 
= f(o,...,0j,..., 0j T 0j, ..., æn) [Axiom (2)] 
= —f(0a1,...,0j,..., Qi —Qj,...,@n) [Axiom (1)] 
= —/(о1,...,—@у,...,‚,—о—оу,..., он) [Axiom (2)] 
= /(о1,...,=у,...,‚,—о —оу,..., Ay) [Axiom (1)] 
= Р(01,...,03,..., —@j,...,Q@n) [Axiom (2)] 


= —f(a,...,0j,...,0j,..., 04) [Axiom (1)]. 


Now, note that the i-th and j-th rows (columns) are interchanged and the result is that the 
determinant is multiplied by —1. 

With the above six basic properties, we are in a position to evaluate most of the deter- 
minants. 


Example 1.2.1. Evaluate the determinant of the matrix 


0 0 

0 0 
ac 10 
14 


2 
1 
Этте 
3 


Solution 1.2.1. Since, this is a triangular matrix, its determinant will be product of its 
diagonal elements. Proceeding step by step, take out 2 from the first row by using Axiom 
(1). Then —1(1) + (2), —2(1) + (3), —3(1) + (3) =. The result of these operations is the 
following: 


[Ap =2 


or NO 
=.. oo 


0 
0 
ol 
4 


ооо н 


Now, take out 5 from the second row so that 1(2) + (3) =, the result being the following: 


|A| = D6) 


оо н о 
=.. о о 
оо о 


1 
0 
0 
0 
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The diagonal element in the third row is 1 and there is nothing to be taken out. Now 
—1(3) + (4) = and then, after having taken out 4 from the fourth row, the result is 


100 0 
IA 200000 о 1 9 
0001 


Now, by Axiom (4) the determinant of the remaining identity matrix is 1. Therefore, the 
final solution is |A| = (2)(5)(1)(4) = 40. 


Example 1.2.2. Evaluate the determinant of the following matrix: 


1 
1 
—1 0 
1 3 
Solution 1.2.2. Since the first row, first column element is a convenient number 1 we 
start operating with the first row. Otherwise, we bring a convenient number to the (1, 1)-th 
position by interchanges of rows and columns (with each interchange the determinant is to 
be multiplied by (—1). Our aim will be to reduce the matrix to a triangular form so that the 


determinant is the product of the diagonal elements. By using the first row let us wipe out 
the elements in the first column. The operations are —2(1) + (3), —5(1) + (4) 2. Then 


12 4 1 1 2 4 1 
IA] = Oud, 22 il Е Ü 3 2 1 

2 1 —1 0 0 —3 —9 -2| 

52 13 0 —8 —19 -2 


Now, by using the second row we want to wipe out the elements below the diagonal in the 
second column. But the first number is 3. One element in the third row can be wiped out 
by simply adding 1(2) + (3) =. This brings the following: 


| 2 4 1 
0 3 2 1 

AESH 0: = l 
0 -8 -19 -2 


If we take out 3 from the second row then it will bring in fractions. We will avoid fractions 
by multiplying the second row by 8 and the fourth row by 3. In order preserve the value, 
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we keep ОТЕ outside. Then, we add the second row to the fourth row or (2) + (4) =. The 
result of these operations is the following: 


1 2 4 1 1 2 4 1 

АІ = 1 |0 24 16 8| 1 |0 24 16 8 
BE {0- Oh Sak, а ВИО? 90, че] = 

0 —24 —57 —6 0 0 —41 2 


Now, multiply the third row by 41 and fourth row by 7 and then add—1(3) + (4) =. The 
result is the following: 


1 2 4 1 


ne 1 024 16 8 
© (8)(3)(7)(41) |0 0 —287 -41" 
0 0 0 55 


Now, take the product of the diagonal elements. Then 


pres (1)(24)(—287)(55) _ 55, 
(8)(3)07)(41) 


Observe that we did not have to repeat the 4 x 4 determinant each time. After wiping 
out the first column elements, we could have expressed the determinant as follows because 
only the elements in the second row and second column onward would then have mattered. 
That is, 


3 2 1 
(ApS) 0 -7 —1|. 
—8 -19 —2 


Similarly, after wiping out the second column elements, we could have written the result- 
ing determinant as 


1)(24) | — = 
a- |- T 


| (99) |-41. 2 


and so on. 


Example 1.2.3. Evaluate the determinant of a 2 x 2 general matrix. 
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Solution 1.2.3. А general 2 x 2 determinant can be opened up by using Axiom (3), that 
is, 


(Aj = ап ар au 0 0 ар Axioma] 
421 422 a2, an| |01 
0 . 
= a1 | + a2 [Axiom (1)]. 
421 a2| 


If any of aj, or aj? is zero, then the corresponding determinant is zero. In the second 
determinant on the right, interchange the second and first columns, which will bring a 
minus sign outside the determinant. That is, 


1 
2 
a2 421 


|A| = ay — = 411422 — 412421. 


a», 422 


The last step is done by using the property that the determinant of a triangular matrix is the 
product of the diagonal elements. We can also evaluate the determinant by using a number 
of different procedures. Taking out a11 if ауу Æ 0, 


a12 
aii 


|A| = ап 
421 an 


Now, perform the operation —a2;(1) + (2) or —az; times the first row is added to the 


second row. Then, 
a12 


А|=а ап Я 
| | 11 0 an — “Ай 


Now, expanding Бу using a property of triangular matrices, we have 


412421 
——] = 411422 — 412421. (1.2.4) 


|A| = an (D [az — 
Consider a general 3 x 3 determinant evaluated by using Axiom (3) first. 


а 412 413 


|A| = |а] an az 
аз] 432 033 
1 0 O 0 1 0 0 0 1 
= а11 |421 422 423} + 412/421 422 аз +413 |а1 422 аз 
азу а32 433 азу а32 433 азу а32 433 
1 0 0 0 1 0 0 0 1 
= а11 |0 a» а3| + алә |а] 0 az3|+a3ļazn аә 0. 
0 a32 a33 азі 0 a33 азі a32 0 
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The first step consists in opening up the first row by making use of Axiom (3). Then, 
eliminate the elements in rows 2 and 3 within the column headed by 1. The next step is 
to bring the columns whose first element is | to the first column position by transposi- 
tions. The first matrix on the right-hand side is already in this format. One transposition is 
needed in the second matrix and two are required in the third matrix. After completing the 
transpositions, the next step consists in opening up each determinant along their second 
row and observing that the resulting matrices are lower triangular or can be made so after 
transposing their last two columns. The final result is then obtained. The last two steps are 
executed below: 


1 0 0 1 0 0 1 0 (0) 
|A| = ац |0 az аз| — а |0 az аәз| +аз!\0 an а 
0 a32 a33 0 аз a33 0 azı аз 


= d1 [422433 — 423432] — a12[a21a33 — @2заз\] + a13[a21a32 — a22a31] 


= 411422433 + 412423431 + 413421432 — 411423432 — 412421433 — 413422431. 
(1.2.5) 


A few observations are in order. Once 1 is brought to the first row first column position 
in every matrix and the remaining elements in this first column are eliminated, one can 
delete the first row and first column and take the determinant of the remaining submatrix 
because only those elements will enter into the remaining operations involving opening 
up the second and successive rows by making use of Axiom (3). Hence, we could have 
written 

a22 423 
432 433 


421 A22 
азі 432 


a) d 
_ 21 05|,, | 
азі 433 


|A| = ап 


This step is also called the cofactor expansion of the matrix. In a general matrix A = (а;;), 
the cofactor of the element aj; is equal to (-D'* Mi; where М; is the minor of aij. 
This minor is obtained by deleting the i-th row and j-the column and then taking the 
determinant of the remaining elements. The second item to be noted from (1.2.5) is that, 
in the final expression for |A|, each term has one and only one element from each row 
and each column of A. Some elements have plus signs in front of them and others have 
minus signs. For each term, write the first subscript in the natural order 1, 2, 3 for the 
3 x 3 case and in the general п x п case, write the first subscripts in the natural order 
1, 2, ..., n. Now, examine the second subscripts. Let the number of transpositions needed 
to bring the second subscripts into the natural order 1, 2, ..., be p. Then, that term is 
multiplied by (— 1)? so that an even number of transpositions produces a plus sign and 
an odd number of transpositions brings a minus sign, or equivalently if р is even, the 
coefficient is plus 1 and if р is odd, the coefficient 15 —1. This also enables us to open up 
a general determinant. This will be considered after pointing out one more property for 
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а 3 x 3 case. The final representation in the 3 x 3 case in (1.2.5) can also be written up 
by using the following mechanical procedure. Write all elements in the matrix A in the 
natural order. Then, augment this arrangement with the first two columns. This yields the 
following format: 

ајр 412 аз а] ар 

421 422 423 421 422 

431 432 433 азі] 432 . 
Now take the products of the elements along the diagonals going from the top left to the 
bottom right. These are the elements with the plus sign. Take the products of the elements 
in the second diagonals or the diagonals going from the bottom left to the top right. These 
are the elements with minus sign. As a result, |A| is as follows: 


|A| = [a11a22433 + 412423431 + 413421432] 
— [413422431 + 411423432 + 41221833]. 


This mechanical procedure applies only in the 3 x 3 case. The general expansion is the 
following: 


ајр 412 ... Ain 
42] 022 ... An "M 
[Аф = |. dan NL У... SPO ан, (1.2.6) 
й ` ` | ii in 
Anl аһ ... Ann 
where o(i1, ..., in) is the number of transpositions needed to bring the second subscripts 
ij,..., İn Into the natural order 1,2,...,n. 


The cofactor expansion of a general matrix is obtained as follows: Suppose that we 
open up an x n determinant A along the i-th row using Axiom (3). Then, after taking out 
dii, di2, .. . , Gin, We obtain n determinants where, in the first determinant, 1 occupies the 
(i, 1)-th position, in the second one, | is at the (i, 2)-th position and so on so that, in the j- 
th determinant, 1 occupies the (i, j)-th position. Given that i-th row, we can now eliminate 
all the elements in the columns corresponding to the remaining 1. We now bring this 1 into 
the first row first column position by transpositions in each determinant. The number of 
transpositions needed to bring this 1 from the j-th position in the i-th row to the first 
position in the i-th row, is j — 1. Then, to bring that 1 to the first row first column position, 
another i — 1 transpositions are required, so that the total number of transpositions needed 
is (i —1)+(j —1) = i + j — 2. Hence, the multiplicative factor is (—1)/*/-? = (—1)/*7, 
and the expansion is as follows: 


[А] = (-1)'t ai Mi + (7D "?aiMio + + (1) "ai Mis 
= di1Ci1 + aj2Ci2 + +++ + ainCin (1.2.7) 
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where Cj; = (DF Mij, Cij is the cofactor of aj; and Mij; is the minor of aij, the minor 
being obtained by taking the determinant of the remaining elements after deleting the i-th 
row and j-th column of A. Moreover, if we expand along a certain row (column) and the 
cofactors of some other row (column), then the result will be zero. That is, 


0 = aj1Cj1 + aj2C j2 ++ Gin C jn, for all i Z j. (1.2.8) 


Inverse of a Matrix Regular inverses exist only for square matrices that are nonsingular. 
The standard notation for a regular inverse of a matrix A is A-l.Itis defined as AAT! = I, 
and A7! A = 1,. The following properties can be deduced from the definition. First, we 
note that AAT! = A? = I = А-А. When A and B are n x n nonsingular matrices, 
then (АВ)! = B~'A7!, which can be established by pre- or post-multiplying the right- 
hand side side by AB. Accordingly, with A" = A x A x --« x A, A7" = AT! х... x 
AT! = (A")-l, m = 1,2,... , and when Aj,..., Ag aren x n nonsingular matrices, 
(A1A2--- A! = A, AL ZEE ASA. We can also obtain a formula for the inverse 
of a nonsingular matrix A in terms of cofactors. Assuming that A~! exist and letting 
Cof(A) = (C;;) be the matrix of cofactors of A, that is, if A = (aij) and if Ci; is the 
cofactor of aj; then Cof(A) = (C;;). It follows from (1.2.7) and (1.2.8) that 


| Е “| 
AT! =—(Cof(A)’ =— |: с. |, 1.2.9 
a ZIP . E ( ) 


that 1s, the transpose of the cofactor matrix divided by the determinant of A. What about 


1 : е А А 
A2? For а scalar quantity a, we have the definition that if b exists such that b х b = a, 
then b is a square root of a. Consider the following 2 х 2 matrices: 


1 0 —] 0 1 0 
в=|, TILES 0 |, B= fo ah 


-1 0 0 1 | 


Thus, if we use the definition B? — A and claim that B is the square root of A, there are 
several candidates for B; this means that, in general, the square root of a matrix cannot 
be uniquely determined. However, if we restrict ourselves to the class of positive definite 
matrices, then a square root can be uniquely defined. The definiteness of matrices will be 
considered later. 
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1.2.1. Inverses by row operations or elementary operations 


Basic elementary matrices are of two types. Let us call them the E-type and the F- 
type. An elementary matrix of the E-type is obtained by taking an identity matrix and 
multiplying any row (column) by a nonzero scalar. For example, 


100 1 00 5 0 0 10 0 
ӊз=|0 1 0|,Е=|0 —2 0О|,Е›=|0 1 0|,Ез=|0 1 Xp. 
0 01 Oo 01 00 1 0 0 —1 
where E1, E», E3 are elementary matrices of the E-type obtained from the identity matrix 
В. If we pre-multiply an arbitrary matrix A with an elementary matrix of the E-type, 
then the same effect will be observed on the rows of the arbitrary matrix A. For example, 
consider a 3 x 3 matrix A = (aij). Then, for example, 


1 O O} Jain an аз ai а12 а13 
ЕА= |0 —2 0 anı an az | = | —-2a34 —2a22 -—2a03 
0 O 1] [аз аз азз a31 a32 a33 


Thus, the same effect applies to the rows, that is, the second row is multiplied by (—2). 
Observe that E-type elementary matrices are always nonsingular and so, their regular in- 
verses exist. For instance, 


1 00 100 
-1 -1 —-lp.. -1 —1 
Ej = |0 =) 0 ЕЕ Saba ВВ рО 015 БЕУ =, 
0 0 I 00 1 


Observe that post-multiplication of an arbitrary matrix by an E-type elementary matrix 
will have the same effect on the columns of the arbitrary matrix. For example, AE, will 
have the same effect on the columns of A, that is, the second column of A is multiplied by 
—2; AE» will result in the first column of A being multiplied by 5, and so on. The F-type 
elementary matrix is created by adding any particular row of an identity matrix to another 
one of its rows. For example, consider a 3 x 3 identity matrix /з and let 
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where F; is obtained by adding the first row to the second row of /3; Ро is obtained by 
adding the first row to the third row of /3; and Ёз is obtained by adding the second row of 
I5 to the third row. As well, F-type elementary matrices are nonsingular, and for instance, 


100 100 1 
F'=]|-1 1 0|,Е;!=| 01 Op Re SO 1 ol, 
001 -10 1 0 


where Ё ЕГ! = В, F; ' F> = 1з and FIF = 13. If we pre-multiply an arbitrary matrix 
A by an F-type elementary matrix, then the same effect will be observed on the rows of 
A. For example, 


1 0 Of} Jay аә аз ai a12 413 
FiA—|1 1 Of} [о a» az |= |an +aı аә 412. az +аз 
0 01| [аз a32 a33 a31 a32 a33 


Thus, the same effect applies to the rows, namely, the first row is added to the second 
row in A (as Ё was obtained by adding the first row of /5 to the second row of 73). The 
reader may verify that РА has the effect of the first row being added to the third row and 
ЕзА will have the effect of the second row being added to the third row. By combining £E- 
and F-type elementary matrices, we end up with a G-type matrix wherein a multiple of 
any particular row of an identity matrix is added to another one of its rows. For example, 
letting 


it is seen that Сі is obtained by adding 5 times the first row to the second row in /з, and 
Сә is obtained by adding —2 times the first row to the third row in 75. Pre-multiplication 
of an arbitrary matrix A by С, that is, G1 A, will have the effect that 5 times the first row 
of A will be added to its second row. Similarly, G2 will have the effect that —2 times the 
first row of A will be added to its third row. Being product of E- and F-type elementary 
matrices, G-type matrices are also nonsingular. We also have the result that if A, B, C are 
n x n matrices and B — C, then AB — AC as long as A is nonsingular. In general, if 
A1, ..., Ак are n x n nonsingular matrices, we have 


B = С = AkAx-17:-: ААВ = AkAkg-1 +- A2A4C; 
= A1A3B = A\(A2B) = (A142) B = (A1A2)C = А! (А2С). 
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We will evaluate the inverse of a nonsingular square matrix by making use of elementary 
matrices. The procedure will also verify whether a regular inverse exists for a given matrix. 
If a regular inverse for a square matrix A exists, then AAT! = J. We can pre- or post- 
multiply A by elementary matrices. For example, 


ААС! = 1 = EF,--- EFAAA | = EF, ЕЕ 
= (Ер... Fi AJA! = (Ек... Fi). 


Thus, if the operations E,---F; on A reduced A to an identity matrix, then AT! is 
Eç - - Fi. If an inconsistency has occurred during the process, we can conclude that there 
is no inverse for A. Hence, our aim in performing our elementary operations on the left of 
A is to reduce it to an identity matrix, in which case the product of the elementary matrices 
on the right-hand side of the last equation will produce the inverse of A. 


Example 1.2.4. Evaluate AT! if it exists, where 


1 1 1 

—1 0 1 
= 2 1 1 
1 1 -1 
Solution 1.2.4. If A^! exists then AAT! = J which means 


1 1 1 1 
-1 O 10 A - 

ZO ud 

11-11 
This is our starting equation. Only the configuration of the elements matters. The matrix 
notations and the symbol AT! can be disregarded. Hence, we consider only the configu- 
ration of the numbers of the matrix A on the left and the numbers in the identity matrix 
on the right. Then we pre-multiply A and pre-multiply the identity matrix by only making 
use of elementary matrices. In the first set of steps, our aim consists in reducing every 
element in the first column of A to zeros, except the first one, by only using the first row. 
For each elementary operation on A, the same elementary operation is done on the identity 
matrix also. Now, utilizing the second row of the resulting A, we reduce all the elements in 
the second column of A to zeros except the second one and continue in this manner until 
all the elements in the last columns except the last one are reduced to zeros by making 
use of the last row, thus reducing A to an identity matrix, provided of course that A 1s 
nonsingular. In our example, the elements in the first column can be made equal to zeros 
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by applying the following operations. We will employ the following standard notation: 
a(i) + (j) = meaning a times the i-th row is added to the j-th row, giving the result. 
Consider (1) + (2); —2(1) + (3); —1(1) + (4) = (that is, the first row is added to the 
second row; and then —2 times the first row is added to the third row; then —1 times the 
first row is added to the fourth row), (for each elementary operation on A we do the same 
operation on the identity matrix also) the net result being 


l l1 11 1000 
0 1 21 1100 
TENER SIM OF 1б 
0 0 2 0 —1 001 


Now, start with the second row of the resulting А and the resulting identity matrix and try 
to eliminate all the other elements in the second column of the resulting A. This can be 
achieved by performing the following operations: (2) + (3); —1(2) + (1) = 


i. 43:8 0-100 
(3 1 1 100 
050 - dip V^ ыр, zd 2 
60-2 @ zd 001 


Now, start with the third row and eliminate all other elements in the third column. This 
can be achieved by the following operations. Writing the row used in the operations (the 
third one in this case) within the first set of parentheses for each operation, we have 2(3) + 
(4); —2(3) + 2); G+) > 


100 CEN T 
010 = 3 -1 -2 0 
001 PLIN 2p. AG 
000 c. 2 21 


Divide the 4th row by 2 and then perform the following operations: 5(4); —1(4) + 
(35 M+Q2); -14 +0) > 


1000 1-1 0 -} 
0100 5; 0-1 7 

<% í qus 
0010 L d uut 
TI ME ч р. 
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Thus, 
I odo dol 
aa d 
A = 2 2 
1 0 0 —1 
3 1 
=5 1 1 5 
This result should be verified to ensure that it is free of computational errors. Since 
кей | 2 -1 0 -$ 1000 
aa 1-19 1 0 5 0-1 5| |0100 
21 12|| 3 0 0—2 0010 
1 1 er -3 1 1 5 0001 


the result is indeed correct. 


Example 1.2.5. Evaluate AT! if it exists where 
1 11 
А=|1 —1 1 
2 02 
Solution 1.2.5. If A^! exists, then AAT! = 13. Write 
1 11 100 
1-1 1J}A7'=|01 0 
2 02 001 


Starting with the first row, eliminate all other elements in the first column with the follow- 
ing operations: —1(1) + (2); —2(1) + (3) > 


1 1 1 1 0 0 
0 -2 0 & -l 10 
0 —2 0 —2 01 


The second and third rows оп the left side being identical, the left-hand side matrix is 
singular, which means that A is singular. Thus, the inverse of A does not exist in this case. 


1.3. Determinants of Partitioned Matrices 


Consider a matrix A written in the following format: 


А = Aty Aiz where A11, A12, A21, A22 are submatrices. 
A2 A22 
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For example, 


а 412 413 Е Ado 


a 
A-—|an a» аз | = ps Tet Ап = [ani]; Ар = [ар a13], A21 = | 1 


431 
431 432 азз 


a a | EA NS 

and A»? = Е | The above is а 2 x 2 partitioning or a partitioning into two sub- 
32 33 

matrices by two sub-matrices. But a 2 x 2 partitioning is not unique. We may also consider 


ay а а13 

Ап = ‚ An = [a33], An = ‚ A21 = [азу a32], 
a» 422 a23 

which is another 2 x 2 partitioning of A. We can also have a 1 x 2 or 2 x 1 partitioning into 

sub-matrices. We may observe one interesting property. Consider a block diagonal matrix. 

Let 


where Ау isr x r, Ano 15 x 5, r+ 5 = n and О indicates a null matrix. Observe 
that when we evaluate the determinant, all the operations on the first r rows will produce 
the determinant of А as a coefficient, without affecting A22, leaving ап r x r identity 
matrix in the place of A11. Similarly, all the operations on the last s rows will produce the 
determinant of A»» as a coefficient, leaving an s x s identity matrix in place of A22. In 
other words, for a diagonal block matrix whose diagonal blocks are А and A22, 


|A] = [А1 x [А2]. (1.3.1) 


Given a triangular block matrix, be it lower or upper triangular, then its determinant is also 
the product of the determinants of the diagonal blocks. For example, consider 


By using A22, we can eliminate А12 without affecting А уу and hence, we can reduce the 
matrix of the determinant to a diagonal block form without affecting the value of the 
determinant. Accordingly, the determinant of an upper or lower triangular block matrix 
whose diagonal blocks are А and A»», is 


|A] = [A11] [A22]. (1.3.2) 


Partitioning is done to accommodate further operations such as matrix multiplication. 
Let A and B be two matrices whose product AB is defined. Suppose that we consider a 
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2 x 2 partitioning of A and B into sub-matrices; if the multiplication is performed treating 
the sub-matrices as if they were scalar quantities, the following format is obtained: 


А Ар || Bi Bj 
AB= 

he Az | [В B 
Ay, By, + A12B21 Agi Bio + А2 В2 
A21 B11 + А2 В] A21 Ві + A22B22 


If all the products of sub-matrices on the right-hand side are defined, then we say that A 
and B are conformably partitioned for the product AB. Let A be an x n matrix whose 
determinant is defined. Let us consider the 2 x 2 partitioning 


A= bs | where Д isr Xr, Aniss X s, r +s =n. 
Азр An 

Then, А12 is r x s and A»; is s x r. In this case, the first row block is [A1; A15] and 
the second row block is [A21 А22]. When evaluating a determinant, we can add linear 
functions of rows to any other row or linear functions of rows to other blocks of rows 
without affecting the value of the determinant. What sort of a linear function of the first 
row block could be added to the second row block so that a null matrix O appears in the 
position of А21? It is —Аз1Атү times the first row block. Then, we have 


Ai Arm 
Az, A22 


Ап А12 


А| = = 2 : 
lal | E An — AnAq А12 


This is a triangular block matrix and hence its determinant is the product of the determi- 
nants of the diagonal blocks. That is, 


|A| = [An] [A22 — An Aq] Аро], |А11| #0. 
From symmetry, it follows that 
|A| = |Az2l 141 — Ai2472 Anıl, [A22] Æ 0. (1.3.3) 


Let us now examine the inverses of partitioned matrices. Let A and A^! be conformably 
partitioned for the product AA~!. Consider a 2 x 2 partitioning of both A and A^. Let 


[Aa Apn zx fA AP 
к= n | апа тш E408 
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where A; and A! are r x r and Ao? and А22 are s x s withr +s =n, Ais n x n and 
nonsingular. AAT! = I gives the following: 


Ап Ар] [All A7] [/ О 
Ар А» | |A А2| |О LI 


That is, 
Ay Al! + АА?! = I, (i) 
Ay A’ + ApA” = О (ii) 
АЛАП + А»А2 = О (iii) 
A31 AP + A A? = L. (iv) 


From (ii), A? = — Ат A157. Substituting in (iv), 


А-А АрА2] + An A? = L = [An — Ann Aq AIA” = L. 


That is, 
A? = (An — An Aq] Ai). 1, [А1113 0, (1.3.4) 
and, from symmetry, it follows that 
AM = (Ап — ApAg Ani) |, [A221 # 0 (1.3.5) 
Ay = (АП — А12(422)-1421)-1, | А?2| 40 (1.3.6) 
An = (A? — A21 (AM)71412)-1. |А z Q, (1337 


The rectangular components A12, A21, А12, A?! can also be evaluated in terms of the sub- 
matrices by making use of Eqs. (i)-(iv). 


1.4. Eigenvalues and Eigenvectors 


Let A be n x n matrix, X be an n x 1 vector, and л be a scalar quantity. Consider the 
equation 
AX — XX > (A—AI)X =O. 


Observe that X — O is always a solution. If this equation has a non-null vector X as a 
solution, then the determinant of the coefficient matrix must be zero because this matrix 
must be singular. If the matrix (A — AJ) were nonsingular, its inverse (A — АГ)! would 
exist and then, on pre-multiplying (A — I)X = O by (A — АГ) !, we would have X = О, 
which is inadmissible since X Z О. That is, 


|A — АД = 0, A being a scalar quantity. (1.4.1) 
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Since the matrix A is п x п, equation (1.4.1) has п roots, which will be denoted by 
Ар, ..., Ал. Then 


[А — АД = Ол — 0л А): An — А), AX; = AjXj. 


Then, Aj,...,A, are called the eigenvalues of A and X; + О, an eigenvector corre- 
sponding to the eigenvalue À j. 


Example 1.4.1. Compute the eigenvalues and eigenvectors of the matrix A = [ T 


Solution 1.4.1. Consider the equation 


11 1 0 
a-An-e- [n Е | =0= 


E 1, 05 a-90-9-1-05 P -3+1=09 
= ce ee асо 


Un 


An eigenvector X; corresponding to Ay = 3 + 55 is given by AX; = МХ or (A — 
411)X, = О. That is, 


1-348) І НЕЕ 
1 2-(+%)]1%ю] LO 


1 5 
1 rao, (i) 
1 5 
Xx] + (5 — 3 = 0. (ii) 
Since A — A,/ is singular, both (i) and (ii) must give the same solution. Letting x2 = 1 in 
(ii), xy = -4 + ы. Thus, one solution Х| is 


n- cH. 


Any nonzero constant multiple of X, is also a solution to (A — A17) X; = О. An eigen- 
vector X» corresponding to the eigenvalue A» is given by (A — A57) X? = О. That is, 


(-;+ S xr +22 -0. (ii) 
€ (5 + E a = о. (iv) 


Mathematical Preliminaries 33 


Hence, one solution is 


1 
X,=] 2 
Any nonzero constant multiple of X» is also an eigenvector corresponding to A». 


Even if all elements of a matrix A are real, its eigenvalues can be real, positive, 
negative, zero, irrational or complex. Complex and irrational roots appear in pairs. If 
а+ ір, i = 4 (—1), and a,b real, is an eigenvalue, then a — ib is also an eigenvalue 
of the same matrix. The following properties can be deduced from the definition: 


(1): The eigenvalues of a diagonal matrix are its diagonal elements; 

(2): The eigenvalues of a triangular (lower or upper) matrix are its diagonal elements; 
(3): If any eigenvalue is zero, then the matrix is singular and its determinant is zero; 

(4): If à is an eigenvalue of A and if A is nonsingular, then 1 is an eigenvalue of A^; 
(5): If А is an eigenvalue of A, then АХ is an eigenvalue of A‘, k = 1,2, ..., their associ- 
ated eigenvector being the same; 


(7): The eigenvalues of an identity matrix are unities; however, the converse need not be 
true; 


(8): The eigenvalues of a scalar matrix with diagonal elements а, а,...,а are a repeated 
n times when the order of A is n; however, the converse need not be true; 


(9): The eigenvalues of an orthonormal matrix, АА! = I, A'A = I, are +1; however, the 
converse need not be true; 


(10): The eigenvalues of an idempotent matrix, A — A?, are ones and zeros; however, the 
converse need not be true. The only nonsingular idempotent matrix is the identity matrix; 


(11): At least one of the eigenvalues of a nilpotent matrix of order r, that is, A + 
O,..., A Æ O, A" = О, is null; 


(12): For an n x n matrix, both A and A' have the same eigenvalues; 
(13): The eigenvalues of a symmetric matrix are real; 


(14): The eigenvalues of a skew symmetric matrix can only be zeros and purely imaginary 
numbers; 


(15): The determinant of А is the product of its eigenvalues: |A| = АА Аһ; 

(16): The trace of a square matrix is equal to the sum of its eigenvalues; 

(17): If A — A' (symmetric), then the eigenvectors corresponding to distinct eigenvalues 
are orthogonal; 

(18): If A — A' and A is n x n, then there exists a full set of n eigenvectors which are 
linearly independent, even if some eigenvalues are repeated. 
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Result (16) requires some explanation. We have already derived the following two 


results: 
[Al = 9 o У (апа, ani, (v) 


and 
А — АД = (М — А-А): An — А). (vi) 


Equation (vi) yields a polynomial of degree n in A where A is a variable. When А = 0, 
we have |A| = AjA2---A,, the product of the eigenvalues. The term containing (—1)"A", 
when writing | А — AJ| in the format of equation (v), can only come from the term (ау — 
A)(a22 — X) +++ (Ann — А) (refer to the explicit form for the 3 x 3 case discussed earlier). 
Two factors containing A will be missing in the next term with the highest power of A. 
Hence, (—1)"A" and (—1)^-!4?-! can only come from the term (a11 — А) · · · (ann — A), as 
can be seen from the expansion in the 3 x 3 case discussed in detail earlier. From (v) the 
coefficient of (105 2501 is 411 + d22 +- - - + апп = tr(A) and from (vi), the coefficient 
of (—1)5- 145-1 isd, +---+A,. Hence (А) = А +- +A, = sum of the eigenvalues of 
A. This does not mean that A; = a11, А = a22,..., An = Gyn, only that the sums will be 
equal. 


Matrices in the Complex Domain When the elements in A = (а;;) can also be complex 
quantities, then a typical element in A will be of the form a + ib, i = /(—1), and a, b 
real. The complex conjugate of A will be denoted by A and the conjugate transpose will 
be denoted by A*. Then, for example, 


1+1 2i 3-i _ l—i —2i 3+i l—i —4i 2+i 
A=| 4 5 14+i/3A=|-4 5 1l-i|,A*-—]| —2i 5 —i 
2—1 i 34i 2Fi =i 3—1 3+i l—i 3—i 


Thus, we can also write A* = (А)! = (A’). When a matrix A is in the complex domain, 
we may write itas A = Ау +i A» where A, and A» are real matrices. Then A = A; — i A2 
and A* = A‘ — i A5. In the above example, 


1 0 3 a 
А=|0 5 1|+| 40 1| 

2 0 3 —1 1 db 

1 0 3 1 2 = 
Ay=|0 5 1|, A&= 4 0 

2 0 3 —1 1 


A Hermitian Matrix If А = A*, then А is called a Hermitian matrix. In the representation 
A = А + i A2, if A = A*, then Aj = Aj or A, is real symmetric, and A? = —A5 or Аз 
is real skew symmetric. Note that when X is an n x 1 vector, X* X is real. Let 
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2 2 
X=]|3+iļ|>X=]|3-—i |, X*=[23—i —2i], 
2i —2{ 
2 
X*X-2[23-i – 20| 3+ | 2 2? +02) + (32 + 17) + (0? 4 (—2)?) = 18. 
2i 


Consider the eigenvalues of a Hermitian matrix A, which are the solutions of |A—AJ| = 0. 
As in the real case, à may be real, positive, negative, zero, irrational or complex. Then, for 
X + О, 


АХ = АХ = (i) 
X*A* = AX* (ii) 


by taking the conjugate transpose. Since A is scalar, its conjugate transpose is A. Pre- 
multiply (7) by X* and post-multiply (ii) by X. Then for X Z О, we have 


X*AX = AX*X (iii) 
ХАХАХ. (iv) 


When A is Hermitian, A = A*, and so, the left-hand sides of (iii) and (iv) are the same. 
On subtracting (iv) from (iii), we have 0 = (A — A)X* X where X*X is real and positive, 
and hence А — A = 0, which means that the imaginary part is zero or A is real. If A is skew 
Hermitian, then we end up with A + à = 0 = А is zero or purely imaginary. The above 
procedure also holds for matrices in the real domain. Thus, in addition to properties (13) 
and (14), we have the following properties: 


(19) The eigenvalues of a Hermitian matrix are real; however, the converse need not be 
true; 

(20) The eigenvalues of a skew Hermitian matrix are zero or purely imaginary; however, 
the converse need not be true. 


1.5. Definiteness of Matrices, Quadratic and Hermitian Forms 


Let X be ann x 1 vector of real scalar variables x1, ..., Xn so that X’ = (x1,..., Xn). 
Let A = (aij) be a real n x n matrix. Then, all the elements of the quadratic form, 
и = X'AX , are of degree 2. One can always write A as an equivalent symmetric matrix 
when A is the matrix in a quadratic form. Hence, without any loss of generality, we may 
assume A — A' (symmetric) when A appears in a quadratic form u — X'AX. 
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Definiteness of a quadratic form and definiteness of a matrix are only defined for A = A’ 
(symmetric) in the real domain and for A = A* (Hermitian) in the complex domain. 
Hence, the basic starting condition is that either A = A’ or A = А*. If for all non-null 
X, that is, X Z О, Х'АХ > 0, A = A’, then A is said to be a positive definite 
matrix and X'AX > О is called a positive definite quadratic form. If for all non-null X, 
X*AX > 0, А = А*, then A is referred to as a Hermitian positive definite matrix and the 
corresponding Hermitian form X* AX > О, as Hermitian positive definite. Similarly, if 
for all non-null X, Х'АХ > 0, X*AX > 0, then A is positive semi-definite or Hermitian 
positive semi-definite. If for all non-null X, Х'АХ < 0, X*AX < 0, then A is negative 
definite and if X'AX < 0, X*AX < 0, then A is negative semi-definite. The standard 
notations being utilized are as follows: 


A > O (A and Х'АХ > О аге real positive definite; (О is a capital o and not zero) 
А > О (А апа X'AX > 0 are positive semi-definite) 

A < О (A and X'AX < О аге negative definite) 

A x О (A and X'AX x 0 аге negative semi-definite). 


All other matrices, which do no belong to any of those four categories, are called indefinite 
matrices. That is, for example, A is such that for some X, X’AX > 0, and for some other 
X, X'AX < 0, then A is an indefinite matrix. The corresponding Hermitian cases are: 


A > O, X* AX > 0 (Hermitian positive definite) 

A > О, X* AX > 0 (Hermitian positive semi-definite) 

A « O, X*AX < 0 (Hermitian negative definite) 

A < O, X*AX < 0 (Hermitian negative semi-definite). (1.5.1) 


In all other cases, the matrix A and the Hermitian form X*AX are indefinite. Certain 
conditions for the definiteness of A = A’ or A = А* are the following: 

(1) All the eigenvalues of А are positive & A > О; all eigenvalues are greater than or 
equal to zero & A > О; all eigenvalues are negative < A < О; all eigenvalues are < 0 
€ A < О; all other matrices A = А! or A = A* for which some eigenvalues are positive 
and some others are negative are indefinite. 

(2) A — A' or A — A* and all the leading minors of A are positive (leading minors are 
determinants of the leading sub-matrices, the r-th leading sub-matrix being obtained by 
deleting all rows and columns from the (т + 1)-th onward), then A > О; if the leading 
minors are > 0, then A > О; if all the odd order minors are negative and all the even 
order minors are positive, then А < O; if the odd order minors are < 0 and the even order 
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minors are > 0, then A < О; all other matrices are indefinite. If A + A' or A # A*, then 
no definiteness can be defined in terms of the eigenvalues or leading minors. Let 


де |, |! 


Note that A is real symmetric as well as Hermitian. Since Х'АХ = 2: + 5x2 > 0 for 
all real x; and x2 as long as both xı and x2 are not both equal to zero, A > O (positive 


definite). X*AX = 2|х1]2 + 5|хо|? = 2[J G2 + x2)? + 5EJ (x3, +55)? > 0 for all 
X11, X12, X21, X22, às long as all are not simultaneously equal to zero, where x; = x11 + 
iX12, X2 = X21 + 1X22 with X11» X12, X21, X22 being real andi = М (— 1). Consider 


ЕНЕ! 


Then, В < О and С is indefinite. Consider the following symmetric matrices: 


553 2-0 S NE 
spams aps sp 


The leading minors of A; are 5 > 0, à А = 16 > 0. The leading minors of A» are 
2 2 : А —2 1 
2 > 0, 3 cd —2 « 0. The leading minors of Аз are —2 < 0, 1 5 = 9 > 0. 


Hence A, > О, Аз < О and A» is indefinite. The following results will be useful when 
reducing a quadratic form or Hermitian form to its canonical form. 

(3) For every real A = А! (symmetric), there exists an orthonormal matrix О, ОО! = 
I, Q'Q = І such that О' АО = diag(A1, ..., àn) where A1, ..., An are the eigenvalues of 
the n x n matrix A and diag(...) denotes a diagonal matrix. In this case, a real quadratic 
form will reduce to the following linear combination: 


X'AX = Y'Q'AQY = Y'diag(Ai,...,A4)Y = Ary; +++ + Алу, Y = Q'X. (1.52) 


(4): For every Hermitian matrix A = A”, there exists a unitary matrix О, U*U = 
I, UU* = I such that 


XtAX = ав (о An)Y АУ Fall Y = Ш*Х. (1.5.3) 


When A > О (real positive definite or Hermitian positive definite) then all the А г’ are 
real and positive. Then, X'A X and Х*АХ are strictly positive. 
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Let A and B be n x n matrices. If AB = BA, in which case we say that A and B 
commute, then both A and B can be simultaneously reduced to their canonical forms (di- 
agonal forms with the diagonal elements being the eigenvalues) with the same orthonormal 
or unitary matrix P, PP’ = I, P'P— I if P is real and P P* = I, P*P = I if complex, 
such that PAP = diag(A1,..., Ал) and P’BP = diag(1,..., un) where A1, ..., An 
are the eigenvalues of A and gu, ..., ш, are the eigenvalues of B. In the complex case, 
P*AP = diag(A1,..., Ал) and P* BP = diag(u1, ..., Un). Observe that the eigenvalues 
of Hermitian matrices are real. 


1.5.1. Singular value decomposition 


For an n x n symmetric matrix A — A', we have stated that there exists an n x n 
orthonormal matrix P, PP’ = In, Р'Р = I, such that P'AP = D = diag(A1, ..., An), 
where A1, ..., Ал are the eigenvalues of A. If a square matrix A is not symmetric, there 
exists a nonsingular matrix Q such that ОАО = Р” = Фао (л, ..., Ал) when the rank 
of A is n. If the rank is less than п, then we may be able to obtain a nonsingular О such 
that the above representation holds; however, this is not always possible. If A is a p x q 


rectangular matrix for p Æ q or if р = q and A Æ А’, then can we find two orthonormal 


matrices U and V such that A = U E o] V’ where A = diag(41, ..., A), ОО = 
Ip, U'U = Ip, УУ = 14, V'V = I, and k is the rank of A. This representation is 


equivalent to the following: 


where Ua) = [Uj,..., Ux], Vay = [Vi,..., Vx], Uj being the normalized eigenvector 
of AA’ corresponding to the eigenvalue A and Vj, the normalized eigenvector of A'A 
corresponding to the eigenvalue А2. The representation given in (i) is known as the singular 
value decomposition of A and A4 > 0,..., Ay > 0 аге called the singular values of А. 
Then, we have 


A? О 
о о 


A? О 


hen 
aa'=u] 4 О 


| U' = Uy Uy АА = V | | V= A vay. <@ 


Thus, the procedure is the following: If p < q, compute the nonzero eigenvalues of AA’, 
otherwise compute the nonzero eigenvalues of A'A. Denote them by A2, ..., М where k is 
the rank of A. Construct the following normalized eigenvectors U1, ..., Ug from AA’. This 
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gives Ua) = [U1, ..., Ux]. Then, by using the same eigenvalues А3, j= 1,..., К, deter- 
mine the normalized eigenvectors, Vj, ..., Ук, from A'A, and let V; = [Vi, ..., Vk]. Let 
us verify the above statements with the help of an example. Let 


pedes 
а! "jt 


Then, 
2 0 1 
Aw а 3]. dA 0 2 —1 
1 —1 1 


The eigenvalues of AA' are at = 3 and А2 = 2. The corresponding normalized eigenvec- 
tors of AA’ are U; and U2, where 
1 0 1 0 
U, = H ,U$5-— H , So that Ua) = [U1, Uo] = [ 4 ; 
Now, by using АТ = 3 and 22 = 2, compute the normalized eigenvectors from A’A. They 
are: 


i d E: Журш йе 
v3} 1 v2 [o 4 0 


Then A = diag(/3, 2). Also, 


1 1 1 
к ЫШ ees Ol ae a eee v. SE ah 
UAV) = | il M E E WIE чое 


This establishes the result. Observe that 


AA! = [UAV UnA Val = Vay AU) 
2 
A'A = [UAV TU) AV] = Vay A^ Vj 
A? = diag(A7, 43). 


1.6. Wedge Product of Differentials and Jacobians 


If y = f(x) is an explicit function of x, where x and y are real scalar variables, then 
we refer to x as the independent variable and to y as the dependent variable. In the present 
context, “independent” means that values for x are preassigned and the corresponding 
values of y are evaluated from the formula y = f(x). The standard notations for small 
increment in x and the corresponding increment in y are Ax and Ay, respectively. By 
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convention, Ax > 0 and Ay can be positive, negative or zero depending upon the function 
f. IE Ax ee to zero, then the limit is zero. However, if Ax goes to zero in the presence 
of the ratio 4 Е Z, then we have a different situation. Consider the identity 


_ (AY 
Ay = (=) 4x >» dy=Adx, A=—. (1.6.1) 


This identity can always be written due to our convention Ax > 0. Consider i — 0. If 
v attains a limit at some stage as Ax — 0, let us denote it by A = = lima,-+0 4 =~ then the 
value of Ax at that stage is the differential of x, namely dx, and the cone sponding Ay is 
dy and A is the ratio of the differentials A = E If x1, ..., xg are independent variables 
and if y = f(x1,..., Xk), then by convention Ax; > 0,..., Axx, > 0. Thus, in light of 
(1.6.1), we have 

af af 


dy = ——dx, +---+ ——dx, (1.6.2) 
Ox] OXk 


3f. is the partial derivative of f with respect to x; or the derivative of f with respect 


where ae 
to xj, keeping all other variables fixed. 


Wedge Product of Differentials Let dx and dy be differentials of the real scalar variables 
x and y. Then the wedge product or skew symmetric product of dx and dy is denoted by 
dx ^ dy and is defined as 


ах Ady = —dy л іх = dx ^dx = 0 and dy Ady = 0. (1.6.3) 


This definition indicates that higher order wedge products involving the same differential 
are equal to zero. Letting 


yi = Р(х, x2) and у = fax, x2), 


it follows from the basic definitions that 


afi afi Op Of iy 
d ——dxi d d dy) = ——dx, + — 
y = T T As х2 and dy» Эх! xı + Эл mx 
By taking the wedge product and using the properties specified in (1.6.3), that is, dx; ^ 


dx, = 0, ахо ^ dx? = 0, іх A dx; = —dx; ^ dx2, we have 


Of ðf ofi dfa 
dyi ^ dyz = [2 f? ofi Of. 
Ox 0X2 дхә OX] 
л Әл 
= n Я аху Adx2. 
Ox, Oxo 


ex ^ ахо 
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In the general case we have the following corresponding result: 


oft oft 
Oxy ```_ дхк 
йу A... л аук = : б. : аху ^... ^ ахь (1.6.4) 
[Ji3 [Ji3 
Oxy 777 дхк 


1 
DS е ai et ie 


where dX = dx, A... л хр, dY = dy, A... Ady, and J = GHI = the determinant of 
the matrix of partial derivatives where the (i, j)-th element is the partial derivative of f; 
with respect to xj. In dX and dY, the individual real scalar variables can be taken in any 
order to start with. However, for each interchange of variables, the result is to be multiplied 


by —1. 


Example 1.6.1. Consider the transformation x1 — r cos? 0, X2 — r sin? 0, О0О < у < 
о, 0x0 x 21 xı > 0,x2 > 0. Determine the relationship between dx; ^ dx» and 
dr ^ dé. 


Solution 1.6.1. Taking partial derivatives, we have 


0 0 

SNL oe a: UM = So pense sin 0, 
or 90 

0 0 

ac sin? Ө, CoO eine: 
or 00 


Then, the determinant of the matrix of partial derivatives is given by 


ax. 7 cos?0 —2rcosé@ sin@ ЕИ 
= z Ж = E 
mo Se sin?0 — 2r cos0 sin 


since cos? 0 + sin? 0 = 1. Hence, 
dx; ^ dx; = 2r cos 0 sin 0 dr л 0, J = 2r соѕ0 sinO. 


We may also establish this result by direct evaluation. 


д д 
oar + 146 = cos? 6 dr — 2r cos sind dé, 
or 00 
д д 
dx. = 24, + PL = sin? 0 dr + 2r cos 0 sin dé, 
r 


dx, = 
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dx; Adx2 = cos? 0 sin? 0 dr ^ dr + cos? 0 (2r cos 0 sin 0) dr ^ dO 
— sin? 0 (2r cos 0 sin 0) d ^ dr — (2r cos0 sin 0)?d0 ^ dé 
= 2r cos 0 sin 0 [cos? 0 dr ^ 10 — sin? 0 10 ^ dr], [dr ^ dr = 0, 40 ^ dé = 0] 
= 2r cos 0 sin 6 [cos? 0 + sin? 0] dr ^ 40, [40 ^ dr = —dr ^ d8] 
= 2r cos 0 sin 0 dr ^ dé. 


Linear Transformation Consider the linear transformation Y — AX where 


yı XI aij...» Alp 


Ур Хр арі ++» арр 

Then, au = а; > (2%) = (aij) = A. Then dY = |A|dX or J = |A|. Hence, the 
J J 

following result: 


Theorem 1.6.1. Let X and Y be p x 1 vectors of distinct real variables and A = (aij) 
be a constant nonsingular matrix. Then, the transformation Y — AX is one to one and 


Y = AX, |A| #0 => dY = |A| dX. (1.6.5) 


Let us consider the complex case. Let X =X 1 + iX, where a tilde indicates that the 
matrix is in the complex domain, X, and X» are real p x 1 vectors if X is p х 1, and 
і = /(—1). Then, the wedge product dX is defined as dX — dX, ^ dX». This is the 
general definition in the complex case whatever be the order of the matrix. If Z is m x n 
and if Z = Z4 +122 where Z, and 7» are m x n and real, then dZ = dZ, ^ dZ>. Letting 
the constant p x p matrix A = A; +i A? where A, and Аз are real and p x p, and letting 


Y = Yi + iY2 be p x 1 where Ү and Y» are real and p x 1, we have 


Оре оа 
= [A1 X; А2 ХХ] + i[A1X2 + А2 Х|] > 
Yı = АХ — АХ, Yo— A1 X2 + AoX1— 


Ү| |A —A2||Xi (i) 
Yo} | Ao А |] Xo] 
Now, applying Result 1.6.1 on (i), it follows that 


Ay, —А2 


avi ^ dry = de 4! Ay 


| dX) лах». (ii) 
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That is, 
Ay =Ag 


dY = det be p 


"ах => dř = Јах (iii) 
where the Jacobian can be shown to be the absolute value of the determinant of A. If 
the determinant of A is denoted by det(A) and its absolute value, by |det(A)|, and if 
det(A) = a + ib with a, b real andi = ./(—1) then the absolute value of the determinant 
is J-/(a + ib)(a — ib) = +y (a? + b?) = +.,/[det(A)][det(A*)] = +./[det(AA*)]. It 


can be easily seen that the above Jacobian is given by 
_ Ај —A2| _ A, —iA2 
TE alee, РЕ | 
(multiplying the second row block by —i and second column block by i) 


_ A, —iA2 Ај —iA2 
= det | zi: Ai 


| (adding the second row block to the first row block) 
I I 
—iA» Aj 
(adding (—1) times the first p columns to the last p columns) 

= [det(A)] [det(A*)] = [det(AA*)] = |det(A)|?. 


= det(A,; — i A2) det | | = det(A,; — iAz)det(A; + i A2) 


Then, we have the following companion result of Theorem 1.6.1. 


Theorem 1.6a.1. Let X and Y be р х 1 vectors in the complex domain, and let A bea 
р х p nonsingular constant matrix that may or may not be in the complex domain. If C is 
a constant p x 1 vector, then 


Y = AX + С, det(A) ZO = аў = |det(A)dX = |det(AA*)| dX. (1.6a.1) 


For the results that follow, the complex case can be handled in a similar way and hence, 
only the final results will be stated. For details, the reader may refer to Mathai (1997). A 
more general result is the following: 


Theorem 1.6.2. Let X and Y be real m x n matrices with distinct real variables as 
elements. Let A be a m x m nonsingular constant matrix and C be a m x n constant 
matrix. Then 


Y = AX + C, det(A) Z0 = dY = |Al"dX. (1.6.6) 


The companion result is stated in the next theorem. 
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Theorem 1.6a.2. Let X and Y be m x n matrices in the complex domain. Let A be a 
constant m x m nonsingular matrix that may or may not be in the complex domain, and C 
be a m x n constant matrix. Then 


Y = AX + C, det(A) Z 0 = dř = |det(AA*)|"dX. (1.6a.2) 


For proving the Theorems 1.6.2 and 1.6a.2, consider the columns of Y and X. Then apply 
'Theorems 1.6.1 and 1.6a.1 to establish the results. If X, X , Y, Y are as defined in Theo- 
rems 1.6.2 and 1.6a.2 and if B is a n x n nonsingular constant matrix, then we have the 
following results: 


Theorems 1.6.3 and 1.6a.3. Let X, X ‚ Y, Y and C be m x n matrices with distinct 
elements as previously defined, C be a constant matrix and B be a n x n nonsingular 
constant matrix. Then 


Y = XB С, det(B) Z0 dY = |B|"dx (1.6.7) 
and И _ _ _ 
Y = ХВ +С, det(B) Æ 0 = dY = |det(B B*)|”dX. (1.ба.3) 


For proving these results, consider the rows of Y, Y, X, X and then apply Theo- 
rems 1.6.1,1.6a.1 to establish the results. Combining Theorems 1.6.2 and 1.6.3, as well 
as Theorems 1.6a.2 and 1.6a.3, we have the following results: 


Theorems 1.6.4 and 1.6a.4. Let X, X sos Y be m x n matrices as previously defined, 
and let A be m x m and B be n x n nonsingular constant matrices. Then 


Y = АХВ, det(A) £ 0, det(B) 40> dY = |Al"|B|"dX (1.6.8) 
and 
Y = АХВ, det(A) Z 0, det(B) Z 0 = аў = |det(AA*)|"|det(BB*)|"dX.  (1.6a.4) 


We now consider the case of linear transformations involving symmetric and Hermitian 
matrices. 


Theorems 1.6.5 and 1.6a.5. Let X = X', Y = Y' be real symmetric p x p matrices and 


let X = X*, Y=Y* be p x p Hermitian matrices. If A is a p x p nonsingular constant 
matrix, then 


Y = AXA', Y - Y', X =X’, det(A) Z0 dY = |A|?*!ax (1.6.9) 


and 
Y = AXA*, det(A) Z 0 = аў = |det(AA*)|?dX (1.62.5) 


for X = X* or X = —X*. 
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The proof involves some properties of elementary matrices and elementary transforma- 
tions. Elementary matrices were introduced in Sect. 1.2.1. There are two types of basic 
elementary matrices, the Е and F types where the E type is obtained by multiplying any 
row (column) of an identity matrix by a nonzero scalar and the F type is obtained by 
adding any row to any other row of an identity matrix. A combination of E and F type 
matrices results in a G type matrix where a constant multiple of one row of an identity 
matrix is added to any other row. The G type is not a basic elementary matrix. By per- 
forming successive pre-multiplication with E, F and G type matrices, one can reduce a 
nonsingular matrix to a product of the basic elementary matrices of the E and F types, ob- 
serving that the E and F type elementary matrices are nonsingular. This result is needed 
to establish Theorems 1.6.5 and 1.6a.5. Let A = ЕЕЕ: ·· Е.Е; for some E,,..., E, 
and F\,..., Fs. Then 


AXA’ = ЕЕЕ... E, FSXF,E, --- ЕЕ. 


Let Yı = Е; ХЕ; іп which case the connection between dX and ау; can be determined 
from F;. Now, letting Yo = E,Y;E,, the connection between dY> and dY can be similarly 
determined from Е,„. Continuing in this manner, we finally obtain the connection between 
dY and dX, which will give the Jacobian as |A|? *! for the real case. In the complex case, 
the procedure is parallel. 

We now consider two basic nonlinear transformations. In the first case, X is a p x p 
nonsingular matrix going to its inverse, that is, Y = X7!. 


Theorems 1.6.6 and 1.6a.6. Let X and X be p x p real and complex nonsingular ma- 
trices, respectively. Let the regular inverses be denoted by Y = X-'and Y = X™!, 
respectively. Then, ignoring the sign, 
|X|-??dX for a general X 
Y = X! > dY = 3 |x|-?*)dxX for X = X' (1.6.10) 
|X|- ?-UdX for X = —X' 


and 
а А А det(X X*)|~7? f ІХ 
Jer QU Mord NEU И" (1.62.6) 
ldet(X X*)| P? for X = X* or X = —X*. 
The proof is based on the following observations: In the real case XXT! = I p => 


(dX)X-! + X(dX-!) = О where (dX) represents the matrix of differentials in X. This 
means that 
(а4аХ-!)=—Х-!(аХ)Х^!. 


46 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


The differentials are appearing only in the matrices of differentials. Hence this situation 
is equivalent to the general linear transformation considered in Theorems 1.6.4 and 1.6.5 
where X and X~! act as constants. The result is obtained upon taking the wedge product 
of differentials. The complex case is parallel. 

The next results involve real positive definite matrices or Hermitian positive definite 
matrices that are expressible in terms of triangular matrices and the corresponding con- 
nection between the wedge product of differentials. Let X and X be complex p x p real 
positive definite and Hermitian positive definite matrices, respectively. Let T = (t;;) be a 


real lower triangular matrix with t; = 0, i < j, tjj > 0, j = 1,..., p, and the rjj's, 
i > j, be distinct real variables. Let Т = (f; у) be a lower triangular matrix with ti у = 0, for 
i < j, the fjj's, i > j, be distinct complex variables, and /;j, j = 1,..., р, be positive 


real variables. Then, the transformations X = TT’ in the real case and X = TT* in the 
complex case can be shown to be one-to-one, which enables us to write dX in terms of dT 


and vice versa, uniquely, and dX in terms of dT, uniquely. We first consider the real case. 
When p — 2, 


X11 X12 
X12 X22 


2 
| ; X11 > 0, x25 > 0, x21 = x12, X11x22 — x15 > O 


due to positive definiteness of X, and 


2 
jar pm ux tt рі 
21 122 22 bti ty +65 


0x11 0x11 0x11 
L— = 2, = = 0, — = 
atil ahi дэ 

0X22 0x22 0x22 

——— =0, —— =2/21, ce 2t 
90111 д] 0122 

0X12 0x12 0X12 

a = 01, s = й, = = 
ді дб д2 


Taking the x;;’s in the order x11, x12, х22 and the t;;’s in the order 111, t21, t22, we form the 
following matrix of partial derivatives: 


ti й2 n 


X11 211 0 0 
X21 * th1 0 
X22 * ж 212 


where an asterisk indicates that an element may be present in that position; however, its 
value is irrelevant since the matrix is triangular and its determinant will simply be the 
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product of its diagonal elements. It can be observed from this pattern that for a general p, 
a diagonal element will be multiplied by 2 whenever хуу is differentiated with respect to 


[jj, J =1,..., p. Then й will appear p times, t22 will appear p — | times, and so on, and 
tpp will appear once along the diagonal. Hence the product of the diagonal elements will 
be 2? | p рр = 2P ILE. j=1 ia }. A parallel procedure will yield the Jacobian in 


the е case. Hence, the following results: 


Theorems 1.6.7 and 1.6а.7. Let X, X, T and T be p x p matrices where X is real pos- 
itive definite, X is Hermitian positive definite, and T and T are lower triangular matrices 
whose diagonal elements are real and positive as described above. Then the transforma- 
tions X = TT’ and X = TT* are one-to-one, and 


p 
dx 2 2^([] gt (1.6.11) 
j=l 
and 
~ р P ~ 
шор (1.6а.7) 
j=l 


Given these introductory materials, we will explore multivariate statistical analysis 
from the perspective of Special Functions. As far as possible, the material in this chapter 
is self-contained. A few more Jacobians will be required when tackling transformations 
involving rectangular matrices or eigenvalue problems. These will be discussed in the 
respective chapters later on. 


Example 1.6.2. Evaluate the following integrals: (1): f. X e- XY AXdX where A > О (real 
positive definite) is 3 x 3 and X is a 3 x 1 vector of distinct real scalar variables; (2): 
"m e U(AXBX)dX where X is a 2 x 3 matrix of distinct real scalar variables, A > O 
(real positive definite), is 2 x 2 and B > O (real positive definite) is 3 x 3, A and B being 
constant matrices; (3): f х>О e "COdX where X = X' > Oisa2x2real positive definite 
matrix of distinct real scalar variables. 


Solution 1.6.2. 

(1) Let X’ = (x1, х, x3), A > О. Since A > О, we can uniquely define А? = (А2). 

Then, write Х'АХ = X'A3A2X = IY Y = A2X. It follows from Theorem 1.6.1 that 
X= |A|-2d¥, and letting Y’ = (y1, y2, уз), we have 


КЕ art f enar 
X Y 
op fe fe fe O2+y2+y2) 
1! J J e 17273 dy; A dy2 ^ буз. 
—oo J —oo J —oo 
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Since 


ee 2 
/ ейдуу = Vm, j = 1,2,3, 
—oo 


/ е-Х'АХах = |A|- (VTV. 
X 


(2) Since A is a 2 x 2 positive definite matrix, there exists a 2 x 2 matrix A? that is 
symmetric and positive definite. Similarly, there exists a 3 x 3 matrix B? that is symmetric 
and positive definite. Let Y = A?X B? = dY = |A|?|B|2dX or dX = |A|-?|B|-!dY 
by Theorem 1.6.4. Moreover, given two matrices A; and A», tr(A;A2) = tr(A2A}1) even 
if AyA2 ~ A24), as long as the products are defined. By making use of this property, we 
may write 


tr(AX BX’) = tr( A? A? X B3 B2X’) = tr(A2X B2B2X'A2) 
— tr[((A2 X B2)(A2 X B2)'] = r(Y Y’) 
where Y = A2XB2 and ау is given above. However, for any real matrix Y, whether 


square or rectangular, tr(Y Y^) = tr(Y'Y) = the sum of the squares of all the elements of 
Y. Thus, we have 


jesse ER arta fete? ay. 
Х Ү 


Observe that since tr(Y Y^) is the sum of squares of 6 real scalar variables, the integral 
over Y reduces to a multiple integral involving six integrals where each variable is over 
the entire real line. Hence, 


Gh 55 6 
J еч) ду = П/ edy; £ | [Ww = (T). 
Y j=l —oo j=l 
Note that we have denoted the sum of the six Yi as у? +-+ ye for convenience. Thus, 


J "ах = lario vm’, 
X 


(3) In this case, X is a 2 x 2 real positive definite matrix. Let X = TT’ where T is lower 
triangular with positive diagonal elements. Then, 


ti 0 ti O]ft + ie tit 
т | pe о ОР |" u юу fan | 
hi 22 hi 02110 ta titi th, +t 
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and tr(TT’) = 3 + (t2 + ti), ї11 > 0, t2 > 0, —осо < hı < oo. From Theorem 1.6.7, 
the Jacobian is 


P 
dX = 2^([ [Par = 22a] 62) dti лао ^ dt. 
j=l 


Therefore 


/ е-ЧООах = / e "CTOD?(2 tro) dT 
X>O Т 


e 2 a ed po 2 
= (| ean) ( | 25е) f 25e drj) 
—oo 0 0 


3 
= [V7] ['(591HI7(0] = T 


Example 1.6.3. Let A = A* > О bea constant 2 x 2 Hermitian positive definite matrix. 
Let X be a 2 x 1 vector in the complex domain and X? > О be a 2 x 2 Hermitian 


positive definite matrix. Then, evaluate the following integrals: (1): f x e- "Aiax ; (2): 
Si о etd X, 


Solution 1.6.3. Я 
(1): Since A = А* > О, there exists a unique Hermitian positive definite square root А2. 
Then, 


X'AX = X*A343X = Y^ Y, 
Y = A? = dX = |det(A)|-! dY 


by Theorem 1.6a.1. But Y*Y = |3124 132 since Y* = (yy, 75). Since the y;'s are scalar 
in this case, an asterisk means only the complex conjugate, the transpose being itself. 
However, 


15)? az TO[C oe» 2 
J e 7 85; = f J e "n7? dy i лур = (2) = л 
yj —oo J —oo 
where y; = ууу tiyj2, i = J/(—1), yj1 and yj? being real. Hence 


jour — |det(A) аи ^ ў 
X Y 


2 2 
= |det(A) (П/ eias) = [йек [ [v 
jer =! 


= |Че((А)| ! x?. 
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(2): Make the transformation Xo = TT* where T is lower triangular with its diagonal 
elements being real and positive. That is, 


~ bs 0] see 12 "T 
T2 NL NI MNT 
hi 122 tibi 12112 + 2 


and the Jacobian is dX; = 2”{[]7_үг” Pta = 221300247 by Theorem 1.6a.7. 
Hence, 


a = 2 2 т 12 ~ 
| e "Cg, = Í 221 рое tatata dt A dto; ^ йй. 
X2>0 T 


But 


za EE s 2 442 
| eTl] а? = J J e (1+1) ару ^ аро 
бі —oo J —oo 


DE 2 з 2 
= (f с) ( f 8) 
—оо —оо 


= mU m = п, 
where f?1 = h11 + it212, i = 4/(—1) and h11, t212 real. Thus 


/ e Oa, = л. 
Х>0 


1.7. Differential Operators 


Let 
А _9_ 
Mil ы an д д д 
X = : V me $ i = | , , | 
` ox 9 ox’ Ox] OX p 
Xp дхр 
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where xj, ...,xp are distinct real scalar үз д, Ka X is the partial differential operator 
and v is the transpose operator. Then 
partial differential operators given by 


| Ax Y3 v is the configuration of all second order 


д2 д2 3? 
3x2 0x1 0x2 0X1 OX p 
д2 д2 д2 
д ə = 0x20x дх2 дходхр 
aX oX' : 
8? д2 д2 
[т дхьдх2 ua ax? | 


Let f(X) be a real-valued scalar function of X. Then, this operator, operating on f will 


be defined as 
af 


Ox] 
д 
acd] 


OX p 


For example, if f = x + хіх + Ж, Шеп aL = 2х1 +x, aL = м + 3x5. and 
of _ |2х1+х2 
ox |хл+ 3x2 : 


Let f Say +a2x2 bera = AX = Х'А, A' = (а\,...,аь), X = (х\,...,хь) 
where a1, ..., ар are real constants and x, ..., хр are distinct real scalar variables. Then 
А = а; and ме have the following result: 

Theorem 1.7.1. Let A, X and f be as defined above where f = a,x; +++ apxp isa 
linear function of X, then 


д 
M | 
ax! 


Letting f = X'X = Yop +2, Эд = 2х ;, and we have the following result. 
1 Р? Ox; J g 


Theorem 1.7.2. Let X be a p x 1 vector of real scalar variables so that X'X = Жү + 
кер 35. Then 


д 
ы 
ox 
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Now, let us consider a general quadratic form f = X'AX, А = A’, where X is a 
р х 1 vector whose components are real scalar variables and A is a constant matrix. Then 
ГА = (архі +++ + ајрхр) + (архі + a2jx2 +: + apjXp) for j = 1,..., р. Hence 
we have the following result: 


Theorem 1.7.3. Let f = X'AX bea real quadratic form where X is a р х 1 real vector 
whose components are distinct real scalar variables and A is a constant matrix. Then 


Of (А+ A)X for a general A 
9X  |2AX when A = А! 


1.7.1. Some basic applications of the vector differential operator 


Let X be a p x 1 vector with real scalar elements x1, ..., xp. Let A = (aij) = A’ bea 
constant matrix. Consider the problem of optimizing the real quadratic form u — X'AX. 
There is no unrestricted maximum or minimum. If A = A’ > O (positive definite), u can 
tend to +оо and similarly, if A = A’ < О, и can go to —oo. However, if we confine 
ourselves to the surface of a unit hypersphere or equivalently require that X'X — 1, then 
we can have a finite maximum and a finite minimum. Let u; = X'AX —A(X'X — 1) so that 
we have added zero to и and hence иј is the same as и, where A is an arbitrary constant 
or a Lagrangian multiplier. Then, differentiating и with respect to x1, ..., хр, equating 
the resulting expressions to zero, and thereafter solving for critical points, is equivalent to 
solving the equation me. = O (null) and solving this single equation. That is, 

Ва = O = 2AX -2X = 0 (A - ADX = 0. (i) 
For (i) to have a non-null solution for X, the coefficient matrix A — AJ has to be singular 
ог its determinant must be zero. That is, |A — àI | = Оапа AX = AX or å is an eigenvalue 
of A and X is the corresponding eigenvector. But 


AX = АХ > X'AX = АХ'Х = А since XX = 1. (ii) 


Hence the maximum value of X'AX corresponds to the largest eigenvalue of A and the 
minimum value of X'AX, to the smallest eigenvalue of A. Observe that when A — A' the 
eigenvalues are real. Hence we have the following result: 


Theorem 1.7.4. Letu = X'AX, A = A', X bea p x 1 vector of real scalar variables 
as its elements. Letting X' X — 1, then 

max [X AX] = Ал = the largest eigenvalue of A 

X'X= 


‚їп [X AX] = Ар = the smallest eigenvalue of A. 
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Principal Component Analysis where it is assumed that A > O relies on this result. 
This will be elaborated upon in later chapters. Now, we consider the optimization of u = 
X'AX, A = A’ subject to the condition X'BX = 1, В = B'. Take А as the Lagrangian 
multiplier and consider иј = X'AX — А(Х'ВХ — 1). Then 

ди1 i 

ку CU ай да are жш. (їїї) 
Note that Х/АХ = AX'BX = 4 from (i). Hence, the maximum of X’AX is the largest 
value of А satisfying (i) and the minimum of X’AX is the smallest value of A satisfying 
(i). Note that when B is nonsingular, |A — AB| = 0 > |ABT! — Al| = 0 or A is an 
eigenvalue of AB~!. Thus, this case can also be treated as an eigenvalue problem. Hence, 
the following result: 


Theorem 1.7.5. Letu = X'AX, A = A’ where the elements of X are distinct real scalar 
variables. Consider the problem of optimizing X' AX subject to the condition X' BX = 
1, B = B’, where A and B are constant matrices, then 


ax [AX] = Ал = largest eigenvalue of ABT}, |B| 40 


= the largest root of | A — à B| = 0; 
zin [AX] — Ap = smallest eigenvalue of AB ^, |B| Z0 
= the smallest root of |A — АВ| = О. 


Now, consider the optimization of a real quadratic form subject to a linear constraint. 
Let u = X'AX, A = A’ be a quadratic form where X is p x 1. Let B'X = Х'В = 1 
be a constraint where В’ = (b1,..., bp), X' = (xi, ..., xp) with bı, ..., bp being real 
constants and x1, ..., хр being real distinct scalar variables. Take 2A as the Lagrangian 
multiplier and consider иј = X'AX —2A(X'B — 1). The critical points are available from 
the following equation 


д 
zy"! = 0 >2AX -2B = 0 > Х=ЛА В, for |A| £0 


=> В'Х = АВ'АТ!В >i = —__. 
B'A-!B 

In this problem, observe that the quadratic form is unbounded even under the restriction 
B'X — 1 and hence there is no maximum. The only critical point corresponds to a min- 
imum. From AX = AB, we have X'AX = AX'B = А. Hence the minimum value is 
А = [B'A-! B]! where it is assumed that A is nonsingular. Thus following result: 
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Theorem 1.7.6. Let и = X'AX, А = A’, |A| z О. Let B'X = 1 where В! = 
(bi, ..., bp) be a constant vector and X is р x 1 vector of real distinct scalar variables. 
Then, the minimum of the quadratic form и, under the restriction B'X = 1 where B is a 
constant vector, is given by 


min [X'AX] = —— 
В'Х=1 B'A-!B 

Such problems arise for instance in regression analysis and model building situations. 
We could have eliminated one of the variables with the linear constraint; however, the op- 
timization would still involve all other variables, and thus not much simplification would 
be achieved by eliminating one variable. Hence, operating with the vector differential op- 
erator is the most convenient procedure in this case. 

We will now consider the mathematical part of a general problem in prediction analysis 
where some variables are predicted from another set of variables. This topic is related to 
Canonical Correlation Analysis. We will consider the optimization part of the problem in 
this section. The problem consists in optimizing a bilinear form subject to quadratic con- 
straints. Let X be a p x 1 vector of real scalar variables x1, ..., xp, and Y be аф x 1 vector 
of real scalar variables уџ,..., yg, where д need not be equal to p. Consider the bilinear 
form u — X'AY where A is a p x q rectangular constant matrix. We would like to optimize 
this bilinear form subject to the quadratic constraints Х'ВХ = 1, Y'CY = 1, В = B' and 
C — C' where B and C are constant matrices. In Canonical Correlation Analysis, B and 
C are constant real positive definite matrices. Take A; and Az as Lagrangian multipliers 
and let н = X'AY — 4(X'BX — 1) — 2(Y'CY — 1). Then 


д 
ax ERE е, ree a ee 


=> X'AY = A1X'BX = А1; (i) 
д 
эү“! = О A'X — СҮ = О = AX = М№СҮ 

= Y'A'X = A;Y'CY = А. (il) 


It follows from (i) and (ii) that 4, = А = А, say. Observe that X'AY is 1 x 1 so that 
X'AY = Y'A'X. After substituting A to Ау and A», we can combine equations (i) and (ii) 
in a single matrix equation as follows: 


ee ag 


| =0. (iii) 
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Opening up the determinant by making use of a result on partitioned matrices from 
Sect. 1.3, we have 


| —AB||—ac — А'(-АВ) А| = 0, |B| 405 
|А'В-1А —›?С| = О. (iv) 


Then v = А? is a root obtained from Eq. (iv). We can also obtain a parallel result by 
opening up the determinant in (iii) as 


| -АС| | - АВ – A(-AC) lA'| = 0 = |AC7!A’ — 22B| 20, |С| 50. (v) 
Hence we have the following result. 


Theorem 1.7.7. Let X and Y be respectively p x 1 апад x 1 real vectors whose com- 
ponents are distinct scalar variables. Consider the bilinear form X' AY and the quadratic 
forms X' B X and Y'CY where В = B', C = C', and B and C are nonsingular constant 
matrices. Then, 


max [X'AY] = |A]| 
X'BX-LY'CY-1 


min — [X'AY] = |Ay| 
X'’BX=1,Y'CY=1 


where AT is the largest root resulting from equation (iv) or (v) and i is the smallest root 
resulting from equation (iv) or (у). 


Observe that if р < q, we may utilize equation (v) to solve for A? and if q < p, 
then we may use equation (iv) to solve for 47, and both will lead to the same solution. In 
the above derivation, we assumed that B and C are nonsingular. In Canonical Correlation 
Analysis, both B and C are real positive definite matrices corresponding to the variances 
Х'ВХ and Y'CY of the linear forms and then, X'AY corresponds to covariance between 
these linear forms. 


Note 1.7.1. We have confined ourselves to results in the real domain in this subsection 
since only real cases are discussed in connection with the applications that are consid- 
ered in later chapters, such as Principal Component Analysis and Canonical Correlation 
Analysis. The corresponding complex cases do not appear to have practical applications. 
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Accordingly, optimizations of Hermitian forms will not be discussed. However, parallel 
results to Theorems 1.7.1—1.7.7 could similarly be worked out in the complex domain. 
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Chapter 2 
The Univariate Gaussian and Related Distributions се 


2.1. Introduction 


It is assumed that the reader has had adequate exposure to basic concepts in Proba- 
bility, Statistics, Calculus and Linear Algebra. This chapter will serve as a review of the 
basic ideas about the univariate Gaussian, or normal, distribution as well as related dis- 
tributions. We will begin with a discussion of the univariate Gaussian density. We will 
adopt the following notation: real scalar mathematical or random variables will be de- 
noted by lower-case letters such as x, y, z, whereas vector or matrix-variate mathematical 
or random variables will be denoted by capital letters such as X, Y, Z,... . Statisticians 
usually employ the double notation X and x where it is claimed that x is a realization of 
X. Since x can vary, it is a variable in the mathematical sense. Treating mathematical and 
random variables the same way will simplify the notation and possibly reduce the con- 
fusion. Complex variables will be written with a tilde such as x, y, x | Y , etc. Constant 
scalars and matrices will be written without a tilde unless for stressing that the constant 
matrix is in the complex domain. In such a case, a tilde will be also be utilized for the 
constant. Constant matrices will be denoted by A, B,C,.... 

The numbering will first indicate the chapter and then the section. For example, 
Ед. (2.1.9) will be the ninth equation in Sect. 2.1 of this chapter. Local numbering for 
sub-sections will be indicated as (i), (ii), and so on. 

Let x; be a real univariate Gaussian, or normal, random variable whose parameters are 
uı and оў; this will be written as x1 ~ Ni (u1, o1), the associated density being given by 


$e) 1 -zz (1-1)? б 
X1) — e 1 ‚ ~O< X1 < OO, -W< H1 < œO, о> U. 

Оу 2л 
In this instance, the subscript 1 in № (-) refers to the univariate case. Incidentally, a density 
15 a real-valued scalar function of x such that f(x) > 0 for all x and f. f(x)dx = 1. 
The moment generating function (mgf) of this Gaussian random variable х, with f, as 
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its parameter, is given by the following expected value, where E[-] denotes the expected 
value of [-]: 
s i22 
E[e!] = J ей" f (xi)dx; = ef Mtz, (2.1.1) 


=00 


2.la. The Complex Scalar Gaussian Variable 


Let xX = xı +ix2, i = J/(—1), xi, хә real scalar variables. Let Е[х1] = ш, E[x2] = 
ио, Var(xı) = оў, Var(x2) = 02. Cov(x1, x2) = 012. By definition, the variance of the 
complex random variable x is defined as 


Var(x) = E[X — EGO][X — EG” 


where * indicates a conjugate transpose in general; in this case, it simply means the conju- 
gate since x is a scalar. Since x — E(x) = x1 t+ix2 — ш — i2 = (xı — ші) + i (x2 — u2) 
and [x — E(x)]* = (ху — i) — i(2 — m2), 


Var(x) = E[X — E(X)][X — E(x)]* 
= E[Qa — ui) io — ио) — ш) (о — wa) = Elx- u^ + (2 — u2Y?] 


Observe that Cov(x,, x2) does not appear in the scalar case. However, the covariance will 
be present in the vector/matrix case as will be explained in the coming chapters. The 
complex Gaussian density is given by 


- и ао уеб 
/@ = еи (ii 
TO 


for x = xı +ix2, B= ш + ио, —ОО < xj < 00, cO < Uj < OO, o? > 0, j=1,2. 
We will write this as X ~ Л (д, 02). It can be shown that the two parameters appearing in 
the density in (ii) are the mean value of x and the variance of x. We now establish that the 
density in (ii) is equivalent to a real bivariate Gaussian density with oi = 102, o2 = jo? 
and zero correlation. In the real bivariate normal density, the exponent is the following, 
with X as given below: 


1 —1 | X1 — MJ _ lo? 0 
Гб — д), (2 — ua) ВЕ y Ape lg? 


1 2 2 о ORE 
аал р = м)” + (x2 1) pecu) (Х — д). 
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This exponent agrees with that appearing in the complex case. Now, consider the constant 
part in the real bivariate case: 


1.2 
50 0 


l 
OnE]? = Ол) 20 10? 


which also coincides with that of the complex Gaussian. Hence, a complex scalar Gaussian 
is equivalent to a real bivariate Gaussian case whose parameters are as described above. 

Let us consider the mgf of the complex Gaussian scalar case. Let t = ft +ір,і = 
A/(—1), with t; and ғ being real parameters, so that 7* = f = tı — it is the conjugate of 
f. Then f*X = рх + охо + i(tix2 — t2x1). Note that tjx4 + хә contains the necessary 
number of parameters (that is, 2) and hence, to be consistent with the definition of the 
mgf in a real bivariate case, the imaginary part should not be taken into account; thus, we 
should define the mgf as E [e €], where 31(-) denotes the real part of (-). Accordingly, 
in the complex case, the mgf is obtained as follows: 


Efe] 
d | QU D- d G- a G-Das 
ло? x 

eu) 


M(t) 


| E E-D- E-D- ge 
ло? 2 


Let us simplify the exponent: 
CS. p té rus 
at (x — Ш)) – aa — 0) (х — A) 


1 1 
= 1 — uy + 2 (0 — ро)? — (к — ш) — о0о — u2)) 


2 
С 2 2 oO 2 oO 2 
a cba. ym Ec E) ees yes SE 

1 (тү ++) — (Ori 5 Ө” + (у? 2 20) 


where y; = есы. уз = 22, йу; = ldx;, j = 1,2. But 
1 [i с 2 
— 7027—20 фу 21, j=1,2 
e y Sv :2. 
VT J- | 
Непсе, 


~ ok o2 52422 12 


which is the mgf of the equivalent real bivariate Gaussian distribution. 
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Note 2.1.1. А statistical density is invariably a real-valued scalar function of the variables 
involved, be they scalar, vector or matrix variables, real or complex. 


2.1.1. Linear functions of Gaussian variables in the real domain 


If x1, ..., xy are statistically independently distributed real scalar Gaussian variables 
with parameters (uj, оў), jJ=1,...,kandifa,,..., ак are real scalar constants then the 
mgf of a linear function и = a,x, +--+ + ax x is given by 


М„@) = Efe™] = Efe’ 7179] — М, (аут)... My, (акі), as Max(t) = М, (at), 


gta bau) (alo? +--+azop) 


k 1 k 
= ef Q iei ajuj) +22071 аўоў) 
кед в 


which is the mgf of a real normal random variable whose parameters аге On 193}, 
хы аўо?). Hence, the following result: 


Theorem 2.1.1. Let the real scalar random variable x; have a real univariate normal 
(Gaussian) distribution, that is, x; ~ Ni (Hj, оў), Ј = 1,..., Капа let xi, ..., Хк be sta- 
tistically independently distributed. Then, any linear function и = a,x, +--+ - -- aykxg, where 
à], ..., ак are real constants, has а real normal distribution with mean value ae T 
2 


е k > 
and variance » ^; аўо?, that is, и ~ NO a aj pj, Уд аў 


2 
о; ). 


Vector/matrix notation enables one to express this result in а more convenient form. 
Let 


of 0 ... 0 
aj Hı X1 D. о? 0 

Жее ares i | oe Sane у= ? 
Е А pt 0 0 .. o2 


Then denoting the transposes by primes, u = L'X = X'L, E(u) = L'u = uL, and 


Var(u) = E[(u — E(u))(u — E(u))] = L'E[(X — E(X))(X — EGOY]L 
—I'Cov(X)L = I'XL 


where, in this case, 27 is the diagonal matrix diag(o?, TS оў). If x1, ..., xx is a simple 
random sample from х, that is, from the normal population specified by the density of x1 
or, equivalently, if x1,..., хк are iid (independently and identically distributed) random 
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variables мү as а common distribution that of х1, then E(u) = ui1L'J = pyJ'L 
and Var(u) = о; 21/1, where u is the linear function defined in Theorem 2.1.1 and J’ = 
(1,1,..., 1) is a vector of unities. 


Example 2.1.1. Let x1 ^ Nj(—1,1) and x? ~ N4(2,2) be independently distributed 
real normal variables. Determine the density of the linear function и = 5x; — 2x2 + 7. 


Solution 2.1.1. Since u is a linear function of independently distributed real scalar nor- 
mal variables, it is real scalar normal whose parameters E(u) and Var(u) аге 


E(u) = 5Е(ху) — 2Е (x2) +7 = 5(—1) — 2(2) +7 = —2 
Var(u) = 25Var(x1) + 4Var(x2) + 0 = 25(1) + 4(2) = 33, 


the covariance being zero since x; and х» are independently distributed. Thus, и ~ 
N,(—2, 33). 


2.1a.1. Linear functions in the complex domain 


We can also look into the distribution of linear functions of independently distributed 
complex Gaussian variables. Let a be a constant and x a complex random variable, where 
a may be real or complex. Then, from the definition of the variance in the complex domain, 
one has 


Var(ax) = E[(ax — E(ax))(ax — Е(аХ))*] = aE[(x — E(x))(x — Е(х))%]а* 


= aVar(x)a* = aa*Var(X) = |a|? Var(X) = |a|?o? 


when the variance of X is o?, where |a| denotes the absolute value of a. As well, E[ax] = 
aE[X] = aj. Then, a companion to Theorem 2.1.1 is obtained. 


Theorem 2.1а.1. Let Xi, ..., Xy be independently distributed scalar complex Gaussian 
variables, xj ~ № (Àj, 01), J =1,...,k. Letaj,...,ay be real or complex constants 
апай = ayX, +++: + ax, be a ш function, Then, u As a univariate complex Gaussian 
distribution given by й ~ NOS «p Oy, ye | la; P Var ;)). 


Example 2.1a.1. Let x;, х2, Хз be independently distributed complex Gaussian univari- 
ate random variables with expected values д = —1 + 2i, дә =i, йз = —1 — i respec- 
tively. Let x; = x1; t ix2j, j = 1, 2, 3. Let [Var(x1 j), Var(x2j)] = [(1, 1), (1, 2), (2, 3)], 
respectively. Let a; = 1 +i, a? = 2—31, аз = 24 i, a4 = 3 + 2i. Determine the density 
of the linear function й = ах] + 4252 + a3x3 + ал. 
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Solution 2.1a.1. 


E(a) = a, E(X1) + ag E(X2) + a3 E (x3) + ад 
= (1+i)(-14 2i) + 2 —3i)() + 2 +i)(-1 — i) + (8 + 2i) 
= (-3+1)+64+21)+ (-1—-31) + (3 +21) = 2 + 21; 
Var(it) = |ay|?Var(X1) + |az|°Var(%2) + |аз|2Маг(з) 


and the covariances are equal to zero since the variables are independently distributed. 
Note that +; = x11 + ix21 and hence, for example, 


Var(1) = E[Gi — EGG) G4 — Е(1))*] 
= E([Gi1 = EGQai) + ioi — Е(х21)) Иби = E1) — i621 — EG0191J 
= E[n — Е(х\))°] — GP ЕГО — Е(хо1))?] 
= Var(x11) + Var(xo1) = 1 -- 1 = 2. 


Similarly, Var(X2) = 1 + 2 = 3, Var(%3) = 2 + 3 = 5. Moreover, jai? = (1)? + (0? = 
2, la3? = (2)? + (3 = 13, [аз]? = (2)? + (1? = 5. Accordingly, Уаг(й) = 2(2) + 
(13) (3) + (5)(5) = 68. Thus, п ~ № (2 + 21, 68). Note that the constant ад only affects 
the mean value. Had a4 been absent from й, its mean value would have been real and equal 
to —1. 


2.1.2. The chisquare distribution in the real domain 


Suppose that x; follows a real standard normal distribution, that is, x1 ^ N1(0, 1), 
whose mean value is zero and variance, 1. What is then the density of i the square of 
a real standard normal variable? Let the distribution function or cumulative distribution 
function of x be Fy, (f) = Prix; < t} and that of y; = x? be Fy, Œ) = Pr{y < t}. Note 
that since y, > 0, 7 must be positive. Then, 


Fy(t) = Priyi < t} = Prix? < t) = Pr(pal < Vt} = РКМ < ху € vt} 
= Pr(xy < Vt} — Pr(xy < —Vt} = Fa (Vd) — Falt). (i) 


Denoting the density of yı by (yı), this density at уу = ¢ is available by differentiating 
the distribution function Ру, (t) with respect to т. As for the density of ху, which is the 
standard normal density, it can be obtained by differentiating Fi, (At) with respect to we 
Thus, differentiating (i) throughout with respect to t, we have 
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О = [E Fa D к] 


t=yi 
d dyt d а(—/ї) 

=| B GT enc Pis 

ар а 
E e E e 

2 af (Qm) теу 2 J/Qz) т=у| 

1 1_ 

= ———-y?"'e-}",0 < ур < oo, with Г(1/2) = Vz. (й) 

23 (1/2) 


Accordingly, the density of y; — x or the square of a real standard normal variable, is a 
two-parameter real gamma with œ = 1 and В = 2 or a real chisquare with one degree of 
freedom. A two-parameter real gamma density with the parameters (о, В) is given by 


1 g-1,-À 
= —_ e ®,0<y<w,a>0, B > 0, 2.1.2 
Ahi PT < у В (2.1.2) 
апа / (ут) = 0 elsewhere. When o = 5 and В = 2, we have a real chisquare density with 
n degrees of freedom. Hence, the following result: 


Theorem 2.1.2. The square of a real scalar standard normal random variable is a real 
chisquare variable with one degree of freedom. A real chisquare with n degrees of freedom 
has the density given in (2.1.2) with a = 5 and B = 2. 


А real scalar chisquare random variable with т degrees of freedom is denoted as x2. 
From (2.1.2), by computing the mgf we can see that the mgf of a real scalar gamma random 
variable y is My(t) = (1 — Bt) * for 1 — Bt > 0. Hence, a real chisquare with m 
degrees of freedom has the mgf M xà (t) = (1— 20)-$ for 1 — 27 > 0. The condition 
1 — Bt > Ois required for the convergence of the integral when evaluating the mgf of a real 
gamma random variable. If y; ~ Xi j =1,...,k andif y,,..., ук are independently 
distributed, then the sum y = y14-: - --- yk ~ TC MN a real chisquare with mı +- - -d-my 


degrees of freedom, with mgf M,(r) = (1 — 21)-2 mimi) for 1 — 2t > 0. 


Example 2.1.2. Let х ^ Nı(—1, 4), x? ^ Nı(2, 2) be independently distributed. Let 
и = x? + 2x2 + 2x; — 8x2 + 5. Compute the density of и. 


Solution 2.1.2. 
и = x? + 2x? + 2x1 — 8x2 + 5 = (ху + 1) + 2 — 2)? — 4 


(xj +1? (x2 – 2)? 
eg с 
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Since x; ^ N4(—1, 4) and x2 ^ № (2, 2) are independently distributed, so are Girdi ~ 


2 
Xi and Gay" ~ x and hence the sum is a real х2 random variable. Then, и = 4y — 4 
with y = x3. But the density of y, denoted by f,(y), is 


_y 
е 2, 0< y. < o0, 


f y (у) = 
and f,(y) = 0 elsewhere. Then, z = 4y has the density 

LO = 67%, 0 Sz < оо, 
and f,(z) = 0 elsewhere. However, since и = z — 4, its density is 


1 «5 
Jul) = те 8 , —4 € u < oo, 


and zero elsewhere. 


2.1a.2. The chisquare distribution in the complex domain 


Let us consider the distribution of 212] of a scalar standard complex normal variable 
21. The density of 21 is given by 


" | o о 
Бб) = 6 a", Z = 211 tiz12, —00 < 21; < оо, jf = 1,2. 


Let; = 2121. Note that 71 is real and hence we may associate а real parameter f to 
the теѓ. Note that 2127 in the scalar complex case corresponds to z? in the real scalar case 
where z ~ N(0, 1). Then, the mgf of йү is given by 


ма) = BIO] a, 
л 


21 


However, 2121 = z% + 22, as £j = тур + іс, і = VCD, where zı; and zi? аге 
real. Thus, the above integral gives (1 — )—2(1 — t)72 = (1— 07! for1— t > 0, 
which is the mgf of a real scalar gamma variable with parameters a = | and B = 1. 
Let Z; ~ Ni(iz;, 02), j= 1,..., К, be scalar complex normal random variables that are 
independently distributed. Letting 


Е Z; — I; N*/Zi— Bj | 
Hs 2 : A) ( : A) ^ real scalar gamma with parameters a = К, B = 1, 
O; O; 
j=1 J J 
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whose density is 


1 
fiu) = Fo" 5^ 0<u<oo,k=1,2,..., (2.1a.2) 


u is referred to as a scalar chisquare in the complex domain having k degrees of freedom, 
which is denoted й ~ ji Hence, the following result: 


Theorem 2.1a.2. Letz; ~ № (2, оў), j = 1,...,k, be independently distributed 
апай = yas) D Then u is called a scalar chisquare having k degrees of 
freedom in the complex domain whose density as given in (2.1a.2) is that of a real scalar 
gamma random variable with parameters a = k and p = 1. 


Example 2.1a.2. Let x; ~ N 10,2), X2 ~ NO — i, 1) be independently distributed 
complex Gaussian univariate random variables. Let й = ХүхХ + 235X2 — 2X5 — 2x2 + 
i(X1 + 2x5) — (Хү + 2x2) + 5. Evaluate the density of и. 


Solution 2.1a.2. Let us simplify и, keeping in mind the parameters in the densities of 
X, and x2. Since terms of the type Xf X, and X5X» are present in и, we may simplify into 
factors involving х} and x; for j = 1, 2. From the density of x; we have 


(Х| i*i) 25 
2 "X 


where 
(41 i*a i) = Gf +061 -) = Хр + ix) —ixp +1. (i) 
After removing the elements in (i) from й, the remainder is 
2x7 XQ — 2X, — 2X2 + 21%; — 21Х + 4 
= 2[(%2 — 1)*(%2 — 1) — іХ + ix5 + 1] 
= 2[(Xo — 1 + i)*(X4 — 1 2- i)]. 
Accordingly, 
Е = i)* (%1 — i) 
2 
[xt + Xil = 2x2 


i= 2 +@—1+0*%@—1+0] 

=2 
where xi is a scalar chisquare of degree 2 in the complex domain or, equivalently, a real 
scalar gamma with parameters (о = 2, B = 1). Letting y = x. the density of y, denoted 


fj =уе 7,0 < y «o. 
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and fy(y) = 0 elsewhere. Then, the density of u = 2y, denoted by f,(u), which is 
given by 


и 
е2, О<и< о, 


АЉ (и) = 


and f,(u) = О elsewhere, is that of a real scalar gamma with the parameters (o = 2, 


B=). 


2.1.3. The type-2 beta and F distributions in the real domain 


What about the distribution of the ratio of two independently distributed real scalar 
chisquare random variables? Let y; ^ x2 and уз ~ x thatis, y; and y» are real chisquare 
random variables with m and n degrees of freedom respectively, and assume that y, and 
y2 are independently distributed. Let us determine the density of и = y;/y2. Let v = y2 
and consider the transformation (y1, y2) onto (и, v). Noting that 


д 1 a д 
du _1 w 0v 6 
дуу у ду? Oy] 
one has 
ðu ди ihe НЕ 
du лду=|%! 92) dy; лау = |» dy; A dy? 
УІ y2 


1 
= — у Ady2 = дуу A dy? = v du ^ dv 
y2 


where the asterisk indicates the presence of some element in which we are not interested 
owing to the triangular pattern for the Jacobian matrix. Letting the joint density of у and 
y2 be denoted by fi»(yi, y2), one has 


1 mop 5-1] _yity2 
finde ,Лл » е? 
22 Г(5)Г(5) 
for 0 < y; «oo, 0 < yo < oo, m,n = 1,2,..., and /12(у1, y2) = 0 elsewhere. Let the 


joint density of и and v be denoted by gi2(u, v) and the marginal density of и be denoted 
by g1(u). Then, 
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1 


$-1g-3 v) 
93 Dorn 


gia(u, v) = c v(uv)27! 


оо 
m_ m+n 4 _,, (+) 
gi(u) = си? '/ v 2 la" » du 
v=0 


m+n 


m +n l+u DE 
sme) 
си 2 2 


CS 


r min m mtn 
а а (2.1.3) 
DIGG) 
for т, п = 1,2,..., 0 € u < œ and gı (u) = 0 elsewhere. Note that о (и) is a type-2 


real scalar beta density. Hence, we have the following result: 


Theorem 2.1.3. Let the real scalar y, ~ х2 and y» ~ x2 be independently distributed, 
then the ratio u — = is a type-2 real scalar beta random variable with the parameters 5 
and 2 where m,n = 1, 2, ..., whose density is provided in (2.1.3). 


This result also holds for general real scalar gamma random variables x; > 0 and 
x2 > O with parameters (o, 8) and (оо, В), respectively, where В 15 a common scale 
parameter and it is assumed that x; and x» are independently distributed. Then, и = = 15 
а type-2 beta with parameters o, and o». 


If и as defined in Theorem 2.1.3 is replaced by Finn or F = Finn = m wh 
is known as the F-random variable with m and n degrees of freedom, where the degrees 
of freedom indicate those of the numerator and denominator chisquare random variables 
which are independently distributed. Denoting the density of F by fr(F) we have the 


following result: 


2 
Theorem 2.1.4. Letting Е = Fm, n = ун where the two real scalar chisquares are 
independently distributed, the real scalar F-density is given by 


т 


rey") түт F2] 
F)-————[—] ——— 2.1.4 


forO0 < Е «oo, m,n = 1,2,..., and fr(F) = О elsewhere. 


Example 2.1.3. Let x; and x2 be independently distributed real scalar gamma random 
variables with parameters (o, P) and (a2, В), respectively, В being a common parameter, 
whose densities are as specified in (2.1.2). Let uj = Foi u? = а из = xı + хэ. 
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Show that (1): из has a real scalar gamma density as given in (2.1.2) with the parameters 
(a1 + оо, В); (2): uy and из as well as из and из are independently distributed; (3): u2 is a 
real scalar type-2 beta with parameters (о, œ2) whose density is specified in (2.1.3); (4): 
u; has a real scalar type-1 beta density given as 


АДЕ): ied 


ee =). 0<u <1, 2.1.5 
a е ids Vat) 


fiu) = 


for X (o1) > 0, 9 (02) > 0 and zero elsewhere. [In a statistical density, the parameters are 
usually real; however, since the integrals exist for complex parameters, the conditions are 
given for complex parameters as the real parts of o; and a2, which must be positive. When 
they are real, the conditions will be simply a; > 0 and a > 0.] 


Solution 2.1.3. Since x; and x2 are independently distributed, their joint density is the 
product of the marginal densities, which is given by 


o4—1..05—1, — Lx +x А { 
Лаби) ау" x7 е ge B 0 Sa) < 00, fu. (i) 


for 9: (o;) > 0, R(B) > 0, j = 1, 2 and zero elsewhere, where 


с = | А 
Вата T (о) Г (оз) 


Since the sum x, + x2 is present in the exponent and both x; and x» are positive, a conve- 
nient transformation is xj = r cos?0, xo =r sin? 6,0<r<w,0<6< i Then, 
the Jacobian is available from the detailed derivation of Jacobian given in the beginning of 
Sect. 2.1.3 or from Example 1.6.1. That is, 


аху ^ dx? = 2r 510 0 cos0 dr A dé. (ii) 


Then from (i) and (ii), the joint density of r and 0, denoted by f, ө (т, 0), is the following: 


1 
Жө т, 0) = c (cos? 8)*1-! (sin? 9)? 12 cos 0 sin j?1**2-1975" (iii) 


and zero elsewhere. As f,.9(7, Ө) is a product of positive integrable functions involving 
solely r and Ө, r and Ө are independently distributed. Since из = ху + xo = r cos? 8 + 


2 
= cos? 0 and из = 995,2. are solely 


r sin? 0 = г is solely a function of r and u; = rear a 
functions of 0, it follows that иј and из as well as u2 and из are independently distributed. 


From (iii), upon multiplying and dividing by Г (оу + оз), we obtain the density of из as 


1 а +aæ2—1_— 
patar (a, + вә) 


X1 


i | 
P, О <из «oo, (iv) 


fi(u3) = 
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and zero elsewhere, which is a real scalar gamma density with parameters (o + o, В). 
From (iii), the density of 0, denoted by /2(@), is 


fo(8) = c (cos? 0)! (sip? 827, 0 «0 < (v) 


оя 


and zero elsewhere, for (0 ;) > 0, j = 1, 2,. From this result, we can obtain the density 
of и = cos? 0. Then, du; = —2 cos 0 sin Ө d0. Moreover, when Ө — 0, шу — 1 and when 
0 24 uj — 0. Hence, the minus sign in the Jacobian is needed to obtain the limits in 
the natural order, 0 < иј < 1. Substituting in (v), the density of иј denoted by f3(u1), 
is as given in (2.1.5), иј being a real scalar type-1 beta random variable with parameters 


(о, оо). Now, observe that 


cos? 0 cos? 0 uy 
sin  1—cos20 1-и. 


u = (vi) 
Given the density of иј as specified in (2.1.5), we can obtain the density of из as follows. 


— ul __ 2 = 1 А JEN : 
AS ил = im? We have иј = pp du; = 15122902: then substituting these values in 


the density of u1, we have the following density for u2: 


P(e +2) 1 — (a1 +02) 
(иә) = — u, (dua) "950 < u < оо, (2.1.6) 
rar) ? 
and zero elsewhere, for Ji(o;) > 0, j = 1,2, which is a real scalar type-2 beta density 
with parameters (o, @2). The results associated with the densities (2.1.5) and (2.1.6) are 
now stated as a theorem. 


Theorem 2.1.5. Let x, and x» be independently distributed real scalar gamma random 
variables with the parameters (a, В), (о, В), respectively, B being a common scale pa- 
rameter. [If x, ~ 35 and хэ ~ Хх then о = 5, a2 = 5 and B = 2.] Then u; = s 
is a real scalar type-1 beta whose density is as specified in (2.1.5) with the parameters 
(01, 02), and u» = A is a real scalar type-2 beta whose density is as given in (2.1.6) with 


the parameters (01, o). 


2.1a.3. The type-2 beta and F distributions in the complex domain 


It follows that in the complex domain, if x2 and x7 are independently distributed, then 
the sum is a chisquare with т + n degrees of freedom, that is, X2 + X2 = Ял. We now 
look into type-2 beta variables and F-variables and their connection to chisquare variables 
in the complex domain. Since, in the complex domain, the chisquares are actually real 
variables, the density of the ratio of two independently distributed chisquares with m and 
n degrees of freedom in the complex domain, remains the same as the density given in 
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(2.1.3) with 5 and 5 replaced by т and n, respectively. Thus, letting й = 32/72 where 
the two chisquares іп the complex domain are independently distributed, the density of i, 
denoted Бу gı (u), is 


“б о ОПР weg ЕТЕНЕ. 
81(и) = For" (1+u) (2.14.3) 


forO < и < оо, m,n = 1,2,..., and g;(u) = 0 elsewhere. 


Theorem 2.1a.3. Let y, ~ X and у» S x2 be independently distributed where yı and 
y» are in the complex domain; then, й = x is a real type-2 beta whose density is given in 


(2.1a.3). 


2 
Ly: where the 
two chisquares in the complex domain are independently distributed, then the density of F 
is that of the real F-density with m and n replaced by 2m and 2n in (2.1.4), respectively. 


If the F random variable in the complex domain is defined as Fe = 


ай I za 
Theorem 2.1a.d. Let F = Fm n = in where the two chisquares in the complex do- 


main are independently distributed; then, F is referred to as an F random variable in the 
complex domain and it has a real F-density with the parameters т and n, which is given 
by 

I'(m +n) (= 


fu тке ше "pm-l m. —(m+n) 
one 2) PE В) Q.1a.4) 


22(Е) = 
for0< F < оо, т, п = 1, 2,..., and 22(Е) = 0 elsewhere. 


А type-1 beta representation in the complex domain can similarly be obtained from 
Theorem 2.1a.3. This will be stated as a theorem. 


Theorem 2.1a.5. Let X ~ X and Xo ^ X2 be independently distributed scalar 
chisquare variables in the complex domain with m and n degrees of freedom, respectively. 
Let uy = xS which is a real variable that we will call u,. Then, à, is a scalar type-1 
beta random variable in the complex domain with the parameters m, n, whose real scalar 
density is 

I'(m 4 n) 


т—1 п—1 
——————— 1— ,0x < 1, 2.1а.5 
Poort” (1 — u1) <u < (2.1a.5) 


fia) = 


and zero elsewhere, for m,n — 1,2,.... 
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2.1.4. Power transformation of type-1 and type-2 beta random variables 


Let us make a power transformation of the type иј = ау, а > 0, 6 > 0. Then, 
du, = a8y?-!dy. For convenience, let the parameters in (2.1.5) be ај = а and o» = В. 
Then, the density given in (2.1.5) becomes 

Г (о + B) 


1 
= а a-l] 838—1 E 
А10) = For. y" "equ ,0<y< 2 


and zero elsewhere, for a > 0, д > 0, (a) > 0, (8) > 0. We can extend the support 


(2.1.7) 


to —a73 <y< a^i by replacing y by |y| and multiplying the normalizing constant by І. 
Such power transformed models аге useful in practical applications. Observe that a power 
transformation has the following effect: for y < 1, the density is reduced if 6 > 1 or raised 
if 8 < 1, whereas for у > 1, the density increases if 6 > 1 or diminishes if 6 < 1. For 
instance, the particular case œ = 1 is highly useful in reliability theory and stress-strength 
analysis. Thus, letting о = 1 in the original real scalar type-1 beta density (2.1.7) and 
denoting the resulting density by 2 (у), one has 


1 
fiy) = aóBy) ! (1 ay! 6, 0«y хаг, (2.1.8) 


fora > 0, à > 0, R(B) > 0, and zero elsewhere. In the model in (2.1.8), the reliability, 
that is, Pr{y > t}, for some f, can be easily determined. As well, the hazard function 


po = is readily available. Actually, the reliability or survival function is 


Pr{y > t}=(1—ar®)’, a> 0, 5>0, 1-20, B>O0, (i) 


and the hazard function is 
Лә(у =) абві?! (ii) 
Pr(y 2t] 1-—at5' 
Observe that the free parameters a, 5 and 8 allow for much versatility in model building 
situations. If 6 = 1 in the real scalar type-1 beta model in (2.1.7), then the density reduces 
to aye! 0 < y € 1, а > 0, which is a simple power function. The most popular 
power function model in the statistical literature is the Weibull model, which is a power 
transformed exponential density. Consider the real scalar exponential density 


g(x) = де, Ө>0, x >0, (iii) 


and zero elsewhere, and let x = y?, 5 > 0. Then the model in (iii) becomes the real scalar 
Weibull density, denoted by gı (у): 


£1) = Ө8у#—!е-#Ў, 8 > 0, 5 > 0, y > 0, (iv) 


and zero elsewhere. 
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Now, let us consider power transformations in a real scalar type-2 beta density given 
in (2.1.6). For convenience let a; = a and a2 = В. Letting y2 = ау, а > 0, ô > 0, ће 
model specified by (2.1.6) then becomes 


Г (е + B) 
Г(о)Г(В) 
fora > 0, ё > 0, (0) > 0, (8) > 0, and zero elsewhere. As in the type-1 beta case, 
the most interesting special case occurs when œ = 1. Denoting the resulting density by 
f22 (у), we have 


fa) = а°8 ау уке (v) 


foa(y) = adBy® (1 + ау?) Ft), 0 < y < oo, (2.1.9) 


fora > 0, 6 > 0, (B) > 0, o = 1, and zero elsewhere. In this case as well, the 
reliability and hazard functions can easily be determined: 


Reliability function = Pr{y > t) = (1+ at?) ?, (vi) 


footy =t) _ apt?! 


Pr(y 2t]  1+at®’ E) 


Hazard function. — 


Again, for application purposes, the forms in (vi) and (vii) are seen to be very versatile due 
to the presence of the free parameters a, ó and f. 


2.1.5. Exponentiation of real scalar type-1 and type-2 beta variables 


Let us consider the real scalar type-1 beta model in (2.1.5) where, for convenience, we 
let à; = о and a = В. Letting и = ae-P", we denote the resulting density Бу f13(y) 
where 


I'(a + В) 
Г(о)Г(В) 
fora > 0, b > 0, (a) > 0, (68) > О, and zero elsewhere. Again, for practical 


application the special case о = 1 is the most useful опе. Let the density corresponding to 
this special case be denoted by fi4(y). Then, 


йз) = ab ebay] — ae”), y > Шаб, (2.1.10) 


ЛО) = abBe~? (1 — ae ^ »)9-, y > Inat, (i) 
fora > 0, b > 0, В > 0, and zero elsewhere. In this case, 


Reliability function = Pr{y > t} = (1 — ae "^P, (ii) 


a4 —bt 
Hazard function — fay ) = ы ; 
Pr{y > 1} [1 — ae- t) 


(iii) 
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Now, consider exponentiating a real scalar type-2 beta random variable whose density 
is given in (2.1.6). For convenience, we will let the parameters be (о = a and o» = f). 
Letting u2 = е? in (2.1.6), we obtain the following density: 


БАО) = np ema + ае ?Уу-@+8) o5 < y «oo, (2.1.11) 


for a > 0, b > 0, 9i(a) > 0, M(B) > 0, and zero elsewhere. The model in (2.1.11) is 
in fact the generalized logistic model introduced by Mathai and Provost (2006). For the 
special case а = 1, B=1, а = 1, b = 1 in (2.1.11), we have the following density: 


e? e 
(1e)? (04e? 


This is the famous logistic model which is utilized in industrial applications. 


footy) = 


—ОО < у < оо. (iv) 


2.1.6. The Student-: distribution in the real domain 


A real Student-t variable with v degrees of freedom, denoted by f,, is defined as t, = 


z— where z ^ № (0, 1) and x2 is a real scalar chisquare with v degrees of freedom, 


м X [v 


z and x2 being independently distributed. It follows from the definition of a real Fin» 


12 
Хў/У 
freedom. Thus, the density of т is available from that of ап ХА. On substituting the values 
m = 1, п = v in ће F-density appearing in (2.1.4), we obtain the density of t? = ш, 
denoted by f,,(w), as 


random variable, that 2 = = Еу, ап F random variable with 1 and v degrees of 


fu) = 


О < ш < оо, (2.1.12) 


v+l ? 


y wi 
>” (1+) 


Var) 
for ш = 12, v = 1,2,... and fw(w) = 0 elsewhere. Since ш = 12, then the part of the 


density for t > 0 is available from (2.1.12) by observing that 102714ш = dt fort > 0. 
Hence for t > 0 that part of the Student-t density is available from (2.1.12) as 


(H) P pi 
fu) = tV OS t < оо, (2.1.13) 
2 


and zero elsewhere. Since (2.1.13) is symmetric, we extend it over (—оо, oo) and so, 
obtain the real Student-t density, denoted by f; (t). This is stated in the next theorem. 

Theorem 2.1.6. Consider a real scalar standard normal variable z, which is divided 
by the square root of a real chisquare variable with v degrees of freedom divided by its 
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. = Z 2 . 
number of degrees of freedom v, that is, t — WV where z and x; are independently 


distributed; then t is known as the real scalar Student-t variable and its density is given 
by 


EMI £2. - 035 


for v = 1,2,.... 
2.1a.4. The Student- distribution in the complex domain 


Let ~ Ni(0, 1) and ў ~ X2 in the complex domain or equivalently y is distributed 
as a real gamma with the parameters (е = v, В = 1), and let these random variables be 
independently distributed. Then, we will define Student-t with v degrees of freedom in the 
complex domain as follows: 


REO UNS z | 
‚ ZI = (1 + 25)2, Z = zı + iz2 


with 21,22 real and i = 4y (—1). What is then the density of ty? The joint density of z and 
y, denoted by f (Y, Z), is 


1 5 
FE, аў ^ di = ——y" le El qs A dz. 
mI (v) 
Let = zı +iz2, i = ~ (—1), where z} = r cos and z2 = r sin, О < г< оо, 0 <0 < 
2л. Then, dz; ^ dz» = r dr A dé, and the joint density of r and y, denoted by fı (r, y), is 
the following after integrating out 0, observing that y has a real gamma density: 


2 v-14,—y-r? 
fit уб Ady = Буу е? rdr ^ dy. 
v 


Letu = t? = ”= and y = ш. Then, du ^ dw = “dr Ady and so, rdr Ady = du A dw. 
Letting the joint density of u and w be denoted by f2(u, ш), we have 


ЖЕ иш 
wre Qt) 


fa(u, ш) = 


vI'(v) 


and the marginal density of u, denoted by g(u), is as follows: 


oo 


M w" u uN—-O-0D 
u) = и, v)dw = ————e Yt dy = (1 + -) 
g(u) A falu, v) / гож 7 


The Univariate Gaussian Density and Related Distributions 75 


forO<u<o, v= 1, 2,..., и = 12 and zero elsewhere. Thus the part of the density of 
t, for t > 0 denoted by fi, (t) is as follows, observing that du = 2tdt for t > 0: 


124-41) 
fir) 2 2r (1 —) кф 266; 2 (2.14.6) 
р 


Extending this density over the real line, we obtain the following density of ¢ in the com- 
plex case: 
Е 124 — (41) 
£o- in(1 £ -) ERUIT EIU ee (2.14.7) 
v 


Thus, the following result: 


Theorem 2.1a.6. Letz ~ Ni (0, 1), y ~ x2, a scalar chisquare in the complex domain 
and let z and y in the complex domain be independently distributed. Consider the real 
variable t = t, = IL. Then this t will be called a Student-t with v degrees of freedom 


y/v 
in the complex domain and its density is given by (2.1a.7). 


2.1.7. The Cauchy distribution in the real domain 


We have already seen a ratio distribution in Sect. 2.1.3, namely the real type-2 beta 
distribution and, as particular cases, the real F-distribution and the real 12 distribution. We 
now consider a ratio of two independently distributed real standard normal variables. Let 
zı ^v N4(0, 1) and z2 ~ № (0, 1) be independently distributed. The joint density of zı and 
z2, denoted by f (z1, 22), is given by 


1 
fi. 22) = ze finm, —00 < Zj 00, j = 1,2. 
л 


Consider the quadrant zı > 0, z2 > 0 and the transformation и = 0 v = z2. Then 


dz; Adzz = vdu Adv, see Sect. 2.1.3. Note that u > 0 covers the quadrants zı > 0, z2 > 0 
and zı < 0, z2 < 0. The part of the density of u in the quadrant u > 0, v > 0, denoted as 
g(u, v), is given by 

g(u,v) = D eTit (+u?) 
2л 


and that part of the marginal density of u, denoted by gj (и), is 


gi(u) — = | ve 2 дь = = 
2л 0 2л(1 + и?) ` 


76 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


The other two quadrants 21 > 0, zo < O and zı < 0, z2 > 0, which correspond to u < 0, 
will yield the same form as above. Accordingly, the density of the ratio и = known as 
the real Cauchy density, is as specified in the next theorem. 


Theorem 2.1.7. Consider the independently distributed real standard normal variables 
zı ^v №(0, 1) and z2 ~ М\(О, 1). Then the ratio и = 2 has the real Cauchy distribution 
having the following density: 


1 


Sulu) = 

By integrating out in each interval (—oo, 0) and (0, oo), with the help of a type-2 beta 

integral, it can be established that (2.1.15) is indeed a density. Since g,(u) is symmetric, 

Pr{u < 0) = Pr{u > 0} = 5, and one could posit that the mean value of и may be zero. 
However, observe that 


1 
2? 


——du = -ln и — OQ. 
0 1+ и? 2 0 

Thus, Е (и), the mean value of а real Cauchy random variable, does not exist, which im- 

plies that the higher moments do not exist either. 


Exercises 2.1 


2.1.1. Consider someone throwing dart at a board to hit a point on the board. Taking this 
target point as the origin, consider a rectangular coordinate system. If (x, y) is a point of 
hit, then compute the densities of x and y under the following assumptions: (1): There is no 
bias in the horizontal and vertical directions or x and y are independently distributed; (2): 
The joint density is a function of the distance from the origin yx? + y?. That is, if fi (x) 
and f(y) are the densities of x and y then it is given that f1(x) fo(y) = g(/x? + у?) 
where fi, f2, g are unknown functions. Show that fı and f» are identical and real normal 
densities. 


2.1.2. Generalize Exercise 2.1.1 to 3-dimensional Euclidean space or 


gx? + y? +27) = Л О) 20) f3(2). 


2.1.3. Generalize Exercise 2.1.2 to k-space, k > 3. 


2.1.4. Let f(x) be an arbitrary density. Then Shannon’s measure of entropy or uncertainty 
is $ = —k S f(x)ln f(x)dx where k is a constant. Optimize S, subject to the conditions 
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(a): f^, f G)0dx = 1; (b): Condition in (a) plus f°. xf(x)dx = given quantity; (с): 
The conditions in (b) plus 155. x? f(x)dx = a given quantity. Show that under (a), f is 
a uniform density; under (b), f is an exponential density and under (c), f is a Gaussian 
density. Hint: Use Calculus of Variation. 


2.1.5. Let the error of measurement e satisfy the following conditions: (1) є = є + є + 

- or it is a sum of infinitely many infinitesimal contributions є;'ѕ where the e;'s are 
independently distributed. (2): Suppose that e; can only take two values ó with probability 
1 and —ó with probability 1 for all j. (3): Var(e) = o? < oo. Then show that this error 
density is real Gaussian. Hint: Use mgf. [This is Gauss’ derivation of the normal law and 
hence it is called the error curve or Gaussian density also.] 


2.1.6. The pathway model of Mathai (2005) has the following form in the case of real 
positive scalar variable x: 


fie) = ax"[l — a(1 = ф)х 7,4 «1,0 < x < [a(1  )] 9, 


ford > 0,а > 0, у > —1 and fi(x) = 0 elsewhere. Show that this generalized type-1 
beta form changes to generalized type-2 beta form for q > 1, 


1 
fo(x) = ох +а(д – 1)х°] -1,д > 1,x >0,5>0,a>0 
and f2(x) = 0 elsewhere, and for g — 1, the model goes into a generalized gamma form 
given by 
f(x) = ceaxt eT, a>0,8>0,x>0 
and zero elsewhere. Evaluate the normalizing constants c1, c2, c3. All models are available 
either from (х) or from fo(x) where q is the pathway parameter. 


2.1.7. Make a transformation x = e™* in the generalized gamma model of f3(x) of Exer- 
cise 2.1.6. Show that an extreme-value density for t is available. 


2.1.8. Consider the type-2 beta model 


_ I'(o + В) 
— l'(Q)T (B) 


and zero elsewhere. Make the transformation x = e? and then show that y has a general- 
ized logistic distribution and as a particular case there one gets the logistic density. 


2.1.9. Show that for 0 € x < oo, В > 0, f(x) = c[1 + piter] is a density, which is 
known as Fermi-Dirac density. Evaluate the normalizing constant c. 


f(x) x? + x) 9*5. x > 0, (о) > 0, R(B) > 0 
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2.1.10. Let f(x) = с[е%+#х — 1]7! for 0 < x < оо, В > 0. Show that f (x) is a density, 
known as Bose-Einstein density. Evaluate the normalizing constant c. 


2.1.11. Evaluate the incomplete gamma integral y (a; b) = Js x*-le-*dx and show that 
it can be written in terms of the confluent hypergeometric series 


SIR 


1Ё\(8;8; у) = =, 
о Ok k! 


(a), = «(о + 1)--- (a +k 1), œ £0, (a)o = lis the Pochhammer symbol. Evaluate 
the normalizing constant c if f(x) = сха 1е7х,0 < x < а, а > 0 and zero elsewhere, is 
a density. 


2.1.12. Evaluate the incomplete beta integral b(o; 6; b) = ie qne. ous 
0, В > 0, 0 <b < 1. Show that itis available in terms of a Gauss’ hypergeometric series 


k 
of the form 9 Fi (a, b; с; 2) = o тшк. |2| < 1. 


2.1.13. For the pathway model in Exercise 2.1.6 compute the reliability function Pr{x > 
t} when y = О for all the cases q < 1, q > 1, q 1. 


2.1.14. Weibull density: In the generalized gamma density f(x) = схУ—1е—4% y > 
0, vy > 0,а > 0, 8 > 0 апа zero elsewhere, if = y then f(x) is called a Weibull density. 
For a Weibull density, evaluate the hazard function h(t) = f(t)/Pr{x > t}. 


2.1.15. Consider a type-1 beta density f(x) = FOR —x)P-lQ0<x<la> 


0,8 > O and zero elsewhere. Let о = 1. Consider a power transformation x = y?, 
ô > 0. Let this model be g(y). Compute the reliability function Pr{y > t} and the hazard 
function h(t) = g(t)/Pr{y > t}. 


2.1.16. Verify that if z is a real standard normal variable, Е(е' ^) = (1—21)-!/2, t < 1/2, 
which is the mgf of a chi-square random variable having one degree of freedom. Owing to 
the uniqueness of the mgf, this result establishes that z? ~ x 


2.2. Quadratic Forms, Chisquaredness and Independence in the Real Domain 


Let x1,...,xy be iid (independently and identically distributed) real scalar random 
variables distributed as N1 (0, 1) and X bea рх 1 vector whose components are x1, . .. , Xp, 
that is, X’ = (х1,..., хр). Consider the real quadratic form иј = X'AX for some p x p 


real constant symmetric matrix A = A’. Then, we have the following result: 


The Univariate Gaussian Density and Related Distributions 79 


Theorem 2.2.1. The quadratic form uj = X'AX, A = A’, where the components of X 
are iid Nı (0, 1), is distributed as a real chisquare with г, r < p, degrees of freedom if and 
only if A is idempotent, that is, A — A?, and A of rank r. 


Proof: When A = A’ is real, there exists an orthonormal matrix P, PP’ = I, P'P = I, 
such that P'AP = diag(A1,...,A p). where the A's are the eigenvalues of A. Consider 
the transformation X — PY or Y — P'X. Then 


X'AX = Y'P'APY = Ау? + A2y3 b ру» (i) 


where ут, ..., ур are the components of Y and Aj,..., А are the eigenvalues of A. We 
have already shown in Theorem 2.1.1 that all linear functions of independent real normal 
variables are also real normal and hence, all the y;'s are normally distributed. The ex- 
pectation of Y is E[Y] = E[P'X] = P'E(X) = P'O = O and the covariance matrix 
associated with Y is 


Cov(Y) = E[Y — E(Y)][Y — E(Y)] = E[YY'] = Р'Соу(Х)Р = P'IP = Р'Р =I 


which means that the y;'s are real standard normal variables that are mutually indepen- 
dently distributed. Hence, у? ~ xi or each у? is а real chisquare with one degree of 


freedom each and the y;'s are all mutually independently distributed. If A = A? and 
the rank of A is r, then r of the eigenvalues of A are unities and the remaining ones are 
equal to zero as the eigenvalues of an idempotent matrix can only be equal to zero or one, 
the number of ones being equal to the rank of the idempotent matrix. Then the represen- 
tation in (i) becomes sum of r independently distributed real chisquares of one degree 
of freedom each and hence the sum is a real chisquare of r degrees of freedom. Hence, 
the sufficiency of the result is proved. For the necessity, we assume that Х'АХ ~ x2 
and we must prove that A — A? and A is of rank r. Note that it is assumed throughout 
that A — A'. If X'AX is a real chisquare having r degrees of freedom, then the mgf of 

uj = X'AX is given by M, (t) = (1— 21) 5. From the TEDIESEDUaBDR. given in (i), the 
mgf's are as follows: M x0 = —(1- 21)72 — М, 20) = = (1—2Ау) ?,]) = l,..., p, 
the y;’s being independently distributed. Thus, the mgf of the right-hand side of (i) is 
Mu (t) = fae — 2) ?. Hence, we have 


р 
@—-2)у#=]]@-2луи)%, 1-2 >0,1—-2Хи>0, ј=1,...,р. (Ù 
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Taking the natural logarithm of each side of (ii), expanding the terms and then comparing 
the coefficients of Qu on both sides for n = 1,2, ..., we obtain equations of the type 


r= aye У-у... (iii) 


The only solution resulting from (iii) is that r of the 4;’s are unities and the remaining 
ones are zeros. This result, combined with the property that A — A' guarantees that A is 
idempotent of rank r. 


Observe that the eigenvalues of a matrix being ones and zeros need not imply that the 
matrix is idempotent; take for instance triangular matrices whose diagonal elements are 
unities and zeros. However, this property combined with the symmetry assumption will 
guarantee that the matrix is idempotent. 


Corollary 2.2.1. If the simple random sample or the iid variables came from a real 
N3(0, o?) distribution, then the modification needed in Theorem 2.2.1 is that LX'AX ~ 
XZ, А = A’, if and only if A = A? and A is of rank г. 


The above result, Theorem 2.2.1, coupled with another result on the independence 
of quadratic forms, are quite useful in the areas of Design of Experiment, Analysis of 
Variance and Regression Analysis, as well as in model building and hypotheses testing 
situations. This result on the independence of quadratic forms is stated next. 


Theorem 2.2.2. Let x1, ..., xp be iid variables from a real N4(0, 1) population. Con- 
sider two real quadratic forms uy = X'AX, A = А! and u = X'BX, В = B', where the 
components of the p x 1 vector X are the xi, ..., xp. Then, uj and uz are independently 


distributed if and only if AB — O. 

Proof: Let us assume that AB = О. Then AB = О = O' = (AB) = B'A' = BA. 
When AB = ВА, there exists a single orthonormal matrix P, PP’ = I, P'P = I, such 
that both the quadratic forms are reduced to their canonical forms by the same P. Let 


uy = X'AX = Ayi b Apy? (i) 


and 
из = X'BX = viy? t c + py, (ii) 
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where A1, ..., Ар are the eigenvalues of A and v1, ..., vp are the eigenvalues of B. Since 
A = A’, the eigenvalues А j ѕ are all real. Moreover, 


АВ = О > 
мл 0 ... 0 vy 0... 0 
, ; 7 0 22 ... 0 0 vo ... 0 
Р АВР = Р АРР ВР = DiD =|. . : г wee ‚| =o, 
ИС ЕА 
(iii) 
which means that A;v; = 0 for all j = 1,..., p. Thus, whenever a А; is not zero, the 


corresponding v; is zero and vice versa. Accordingly, the А ;’s and v;'s are separated in 
(i) and (ii), that is, the independent components are mathematically separated and hence 
u and uo» are statistically independently distributed. The converse which can be stated as 
follows: if иј and u2 are independently distributed, A = A’, В = B’ and the ху” are 
real па № (0, 1), then AB = О, is more difficult to establish. The proof which requires 
additional properties of matrices, will not be herein presented. Note that there are several 
incorrect or incomplete "proofs" in the literature. A correct derivation may be found in 
Mathai and Provost (1992). 


When xj, ...,xp are iid Nj(0, o2), the above result on the independence of quadratic 
forms still holds since the independence is not altered by multiplying the quadratic forms 


by 2l 


Example 2.2.1. Construct two 3 x 3 matrices A and B such that A = A’, В = B’ [both 
are symmetric], A — A? [A is idempotent], AB — O [A and B are orthogonal to each 
other], and A has rank 2. Then (1): verify Theorem 2.2.1; (2): verify Theorem 2.2.2. 


Solution 2.2.1. Consider the following matrices: 
1 
z 0 —» 101 
А=| 01 0|,В=|0 0 0 
0 5 101 


Note that both A and B are symmetric, that is, A = A’, B = B’. Further, the rank of A 
is 2 since the first and second row vectors are linearly independent and the third row is a 
multiple of the first one. Note that A? = A and AB = О. Now, consider the quadratic 
forms u = X'AX and v = X'BX. Then u = jx? + xP + 5х2 xix = x3+ Ga — хз). 
Our initial assumption is that x; ~ № (0, 1), j = 1, 2,3 and the x;'s are independently 
distributed. Let y; = won — хз). Then, E[yi] = 0, Var(y1) = +5[Var(x1) + Var(x3)] = 
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jl + 1] = 1. Since у is a linear function of normal variables, у is normal with the 
parameters E[y1] = 0 and Var(yi) = 1, that is, уу ^ № (0, 1), and hence у? ~ x" 
as well, x$ ~ Xi Thus, u ^ х2 since x2 and y; are independently distributed given 
that the variables are separated, noting that y; does not involve x2. This verifies Theo- 
rem 2.2.1. Now, having already determined that AB = O, it remains to show that u and 
v are independently distributed where v — X'BX — x? + A + 2x1x3 = (x1 + x3)”. Let 
у = A0 + хз) => y2 ^ №(0, 1) as y2 is a linear function of normal variables and 
hence normal with parameters E[y2] = 0 and Var(y?) = 1. On noting that v does not con- 
tain x2, we need only consider the parts of и and v containing x; and x3. Thus, our question 
reduces to: are y, and y? independently distributed? Since both yı and y» are linear func- 
tions of normal variables, both y; and y» are normal. Since the covariance between y; and 
y2, that is, Соу(ут, y2) = 4Cov(xi — x3, ху + хз) = $[Var(xi) — Var(x3)] = [1—1] = 0, 
the two normal variables are uncorrelated and hence, independently distributed. That 15, y; 
and y» are independently distributed, thereby implying that и and v are also independently 
distributed, which verifies Theorem 2.2.2. 


2.2a. Hermitian Forms, Chisquaredness and Independence in the Complex Domain 


Let хү, X2, ..., Xy be independently and identically distributed standard univariate 
Gaussian variables in the complex domain and let X be a k x 1 vector whose com- 
ponents are x,,..., Xy. Consider the Hermitian form X*AX, A = А* (Hermitian) 


where A is a k x k constant Hermitian matrix. Then, there exists a unitary matrix Q, 
QQ* = I, Q*Q = I, such that О* АО = diag(A1, ..., Ax). Note that the A;’s are real 
since A is Hermitian. Consider the transformation X — QY . Then, 


X* AX = Vil 4 A 


where the y;'s are iid standard normal in the complex domain, y; ~ NQ(0,1,j = 
1,...,k. Then, Vij = БАЕ ~ ree a chisquare having one degree of freedom in the 
complex domain or, equivalently, a real gamma random variable with the parameters 
(a = 1,8 = 1), the I5; ^s being independently distributed for j = 1,...,k. Thus, 
we can state the following result whose proof parallels that in the real case. 


Theorem 2.2а.1. Let 31,..., Xy be iid № (0, 1) variables in the complex domain. Con- 
sider the Hermitian form u — X* AX, A — A* where X is a k x 1 vector whose compo- 
nents are X, ..., Xy. Then, u is distributed as a chisquare in the complex domain with r 


degrees of freedom or a real gamma with the parameters (a = r, В = 1), if and only if A 
is of rank r and A = А?. 
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Theorem 2.2a.2. Let the x;’s and X be as in Theorem 2.2a.1. Consider two Hermitian 
forms uy = X*AX,A = A* and u = X*BX, B = B*. Then, иу and из are indepen- 
dently distributed if and only if AB — O (null matrix). 


Example 2.2a.1. Construct two 3 x 3 Hermitian matrices A and В, that is А = A*, В = 
B*, such that A — A? [idempotent] and is of rank 2 with AB = O. Then (1): verify 
Theorems 2.2a.1, (2): verify Theorem 2.2a.2. 


Solution 2.2a.1. Consider the following matrices 


1 o -4# 1 о 1+0 

2 VB 2 VB 
_ a-i) 1 (1—1) 1 

a 5 ло 


It can be readily verified that A = A*, В = В*, А = A?, AB = О. Further, on multi- 


plying the first row of A by — 020 , we obtain the third row, and since the third row is a 


multiple of the first one and the first and second rows are linearly independent, the rank of 
A is 2. Our initial assumption is that x; ~ № (0, 1), j = 1, 2, 3, that is, they are univariate 
complex Gaussian, and they аге independently distributed. Then, ХХ [ел a a chisquare 
with one degree of freedom in the complex domain or a real gamma random variable with 
the parameters (о = 1, В = 1) for each j = 1, 2, 3. Let us consider the Hermitian forms 
и = X* AX and v = X* BX, X' = (Xi, Xo, Хз). Then 

w= A Sith a AX t Itc gts 


2 2 
(14i)... m» 
"ug dem T X3Xa] 


„о plao lu o | 
= + [5б -s9rtZ6 | © 


1 


where 
(1 +i) (1+1) 
„8 


8 
А Chapa) TEL USE DT SES = T 
Varı) = E{|2 | E [stai] = ЦЯ} = Мақ) —-1. — (D 
J/8 J/8 1 1 
Since у! is a linear function of х], it is a univariate normal in the complex domain with 
parameters О and 1 or y; ~ N,(O, 1). The part not containing the x: in (i) can be written 
as follows: 


ў = 2 


2 
ši > Eli] = 0, Уаг(ў1) = р | Var(1) 


[5% — sy [56 = &)| ~ XE (iii) 
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since ў — Хз ~ N 1(0, 2) as Уу — Хз is a linear function of the normal variables y, and x3. 
Therefore u = йт +x = 35, that is, a chisquare having two degrees of freedom in the 
complex domain or а real gamma with the parameters (е = 2, B = 1). Observe that the 
two chisquares are independently distributed because one of them contains only x» and the 
other, x; and хз. This establishes (1). In order to verify (2), we first note that the Hermitian 
form v can be expressed as follows: 


T. as 140i)... ( 1). 
2 ХХ e Ts ХХ ie Se ati +; 


which can be written in the following form by making use of steps similar to those leading 
to (iii): 


v= ~ 3X3 


2 


(+i) Xp X3 M Xp X3 _2 . 
CARIBE. AP w 

or v is a chisquare with one Ко of freedom in е domain. Observe that X» is 
absent in (iv), so that we need only compare the terms containing x; and Хз in (iii) and (iv). 
= 20505 E eu — x3. Noting that the covariance 


between y» and ўз is zero: 


Cs) Var a Vaya di 


Cov(y2, ӱз) = 2 


and that y? and ўз are linear functions of normal variables and hence normal, the fact that 
they are uncorrelated implies that they are independently distributed. Thus, и and v are 
indeed independently distributed, which establishes (2). 


2.2.1. Extensions of the results in the real domain 


Let x; ~ Mi(uj, 97), j = 1,...,k, be independently distributed. Then, 2 "e 
MiG 0), 05; 2-0, j=1,...,k. Let 


Е Og 70. sue “0 oa 0... 0 
| d 0 o2 ... 0 t |0 о... 0 
Х=|:|,ш=(: |, х= р... 22 = |. . Е 
_ ^ 0 0 . o 0 0 . o 
Then, let 
1 1 1 1 
Y=572X= |: |, Еу] = x-E[X] = Ху. 
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If u = О, it has already been shown that Y'Y ~ x2. If u Æ О, then Y'Y ~ x20), A= 
1 ш У ш. It is assumed that the noncentral chisquare distribution has already been dis- 
cussed in a basic course in Statistics. It is defined for instance in Mathai and Haubold 
(2017а, 2017b) and will be briefly discussed in Sect. 2.3.1. If u = О, then for any k х k 
symmetric matrix A = A’, Y'AY ~ x? if and only if A = A? and A is of rank r. This 
result has already been established. Now, if и = О, then X'AX = Y'X 2AD2Y ~ Xe if 
and only if DIAD2 = X3AXAX? = А = AXA and УЗА 2 is of rank г or A is of 
rank т since 27 > О. Hence, we have the following result: 


Theorem 2.2.3. Let the real scalars x; ^ Ni(uj, 07), j=1,...,k, be independently 
distributed. Let 


X1 yı | | H1 
Nes lai esu е deem s 
Xk Ук Шк 


Then for апу k х k symmetric matrix А = A’, 
2. 
=O 
YAX Gy $9430 177 u E РЕНИ: 
х0) iff uz О,=»' X AX ц 
if and only if A = AX A and A is of rank г. 


Independence is not altered if the variables are relocated. Consider two quadratic forms 
Х'АХ and X'BX, A = A’, B = B'. Then, X'AX = Y'EiAZ2Y and X'BX = 
1 1 
Y' 2 ВУ ЗҮ, and we have the following result: 


Theorem 2.2.4. Let xj, X, Y, X be as defined in Theorem 2.2.3. Then, the quadratic 
forms X'AX = Y X2AZ?Y and X'BX = Y X3BZ3Y, A = A', B = B', are indepen- 
dently distributed if and only if AX B = О. 


Let X, A and X be as defined in Theorem 2.2.3, Z be a standard normal vector whose 
components z;, i = 1,...,k are iid № (0, 1), and P be an orthonormal matrix such that 


P'E3AX1P = diag(A1, ..., А); then, a general quadratic form X' AX can be expressed 
as follows: 


X'AX = (Z'E!  n')AGESZ + и) = (Z + En PP'ETAESPPNZ + Х-и) 
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where Лр, ..., А are the eigenvalues of X АУ, Hence, ће following decomposition of 
the quadratic form: 


X'AX = Aq (uy + Б)? +e + Ак(ик + DY)’, (2.2.1) 


| КИ К 
where | : | = P'X-)y, : | = P'| : |, and hence the uj's are iid Nj(0, 1). 
|| Lu] 19) 

Thus, X'AX can be expressed as a linear combination of independently distributed 
non-central chisquare random variables, each having one degree of freedom, whose non- 
centrality parameters are respectively b; /2, j =1,...,k. Of course, the k chisquares will 
be central when и = О. 


2.2a.1. Extensions of the results in the complex domain 


Let the complex scalar variables X; ~ N 102, 0?), j= 1,..., К, be independently 
distributed and X = йар (о, ..., ор). As well, let 
Р o 0... 0 З : 
A ! 0 o2 ... 0 £m. c (reel). RE 
серр IW a ede Х=Ү=|:|,д=|: 
x 0 0 ... o2 Mie КЕ 


where X2 is the Hermitian positive definite square root of X. In this case, y; ~ 
N 1 (E, 1), j =1,...,k and the y;'s are assumed to be independently distributed. Hence, 
J 


for any Hermitian form X*AX, A — A*, we have X*AX — Y*X3AX:Y. Hence if 
jt = О (null vector), then from the previous result on chisquaredness, we have: 


Theorem 2.2a.3. Let X, X,Y, б be as defined above. Let и = X*AX, A— A* bea 
Hermitian form. Then u ^ х2 in the complex domain if and only if A is of rank r, i, = О 
and A = AXA. [A chisquare with r degrees of freedom in the complex domain is a real 
gamma with parameters (a = r, В = 1).] 


If д ~ О, then we have a noncentral chisquare in the complex domain. A result on the 
independence of Hermitian forms can be obtained as well. 


Theorem 2.2a.4. Let X { Y , X be as defined above. Consider the Hermitian forms иј = 
X* AX, А = A* andu» = X* BX, В = B*. Then и\ and uz are independently distributed 
if and only if AX B = О. 
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The proofs of Theorems 2.2a.3 and 2.2a.4 parallel those presented in the real case and 
are hence omitted. 


Exercises 2.2 


2.2.1. Give a proof to the second part of Theorem 2.2.2, namely, given that Х'АХ, A = 
A’ and X'BX, В = B' are independently distributed where the components of ће p x 1 
vector X are mutually independently distributed as real standard normal variables, then 
show that AB — O. 


2.2.2. Let the real scalar x; ~ № (0, o’), с? > 0, j = 1,2,...,k and be indepen- 


dently distributed. Let X’ = (x1,..., xy) or X is the k x 1 vector where the elements аге 
X1, ..., Xk. Then the joint density of the real scalar variables x1, . . . , хк, denoted by f(X), 
is 
1 -L X'X 
/(Х) = е 2? ‚© < x; < œ, у l,...,k. 
(Ол) ' 


Consider the quadratic form и = X'AX, А = A’ and X is as defined above. (1): Compute 
the mgf of u; (2): Compute the density of u if A is of rank r and all eigenvalues of A 
are equal to A > 0; (3): If the eigenvalues are A > 0 for m of the eigenvalues and the 
remaining n of them are A < 0, m +n = r, compute the density of и. 


2.2.3. In Exercise 2.2.2 compute the density of u if (1): rı of the eigenvalues are Ај each 
and r2 of the eigenvalues are A» each, rı + r? = r. Consider all situations Ау > 0,4» > 0 
etc. 


2.2.4. In Exercise 2.2.2 compute the density of u for the general case with no restrictions 
on the eigenvalues. 


2.2.5. Let xj ~ №(0, o?), j = 1, 2 and be independently distributed. Let X’ = (x1, x2). 
Let u — X'AX where A — A'. Compute the density of u if the eigenvalues of A are (1): 2 
and 1, (2): 2 and —1; (3): Construct a real 2 x 2 matrix A = A’ where the eigenvalues are 
2 and 1. 


2.2.6. Show that the results on chisquaredness and independence in the real or complex 
domain need not hold if A = А“, B Æ B*. 


2.2.7. Construct a 2 x 2 Hermitian matrix A = A* such that A = A? and verify The- 
orem 2.2a.3. Construct 2 x 2 Hermitian matrices A and B such that AB = O, and then 
verify Theorem 2.2a.4. 
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2.2.8. Let X1,...,Xm be a simple random sample of size m from a complex normal pop- 
ulation № (ді, єў). Let ў1,..., Yn be iid М(йә, o2). Let the two complex normal popula- 
tions be independent. Let 


m n 
$26; -3*G; - Jot. s] 2 3:6; - *6; — Ӱ)/02, 
j=l j=l 
m 1 п 
(xj — ji)" GG — Ва), s2 = E У; — i2) G; — Ёо) 
j=l 2 j=l 


Then, show that 


s? /m > s? /(m — 1) > 
~ man ^" Гт—1,п—1 
аА 52/(п — 1) 
р: 
for of = o5. 


2 
2.2.9. In Exercise 2.2.8 show that A is a type-2 beta with the parameters т and n, and 
"21 


is a type-2 beta with the parameters m — 1 and n — 1 for єў = o2. 


ч an 
Silo 


2.2.10. In Exercise 2.2.8 if o? = 02 = o? then show that 


a[3:6 -9*& -5«Y6; - 9*6; - 9] ~ Es 


j-l j=l 


2.2.11. In Exercise 2.2.10 if X and ў аге replaced by д and до respectively then show 
that the degrees of freedom of the chisquare is m + n. 


2.2.12. Derive the representation of the general quadratic form X" AX given in (2.2.1). 
2.3. Simple Random Samples from Real Populations and Sampling Distributions 


For practical applications, an important result is that on the independence of the sam- 
ple mean and sample variance when the sample comes from a normal (Gaussian) pop- 
ulation. Let x1, ..., x, be a simple random sample of size n from a real Nj (u1, с?) ог, 
equivalently, x1, ..., x, are па Ni (m1, от). Recall that we have established that any lin- 
ear function L'X = X'L,L' = (а1,...,аһ), X' = (x1,..., Xn) remains normally dis- 
tributed (Theorem 2.1.1). Now, consider two linear forms y; = LX, y2 = L5X, with 
Li = (a1,..., an), D = (bi, ..., bn) where a1, ..., an, b1,..., bn are real scalar con- 
stants. Let us examine the conditions that are required for assessing the independence of 
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the linear forms yı and y». Since x1,..., x, are iid, we can determine the joint mgf of 
X1, ..., Xn. We take an x 1 parameter vector T, Т” = (t1, ..., tn) where the 1;`ѕ are scalar 
parameters. Then, by definition, the joint mgf is given by 


n n 2 
j u+ lto? 4271 
Е[е! ^] = | | My, (tj) = | [етк =e 1T ita TT (2.3.1) 


since the x;'s are iid, J’ = (L..., I). Since every linear function of x;1,...,x, isa 
2,2 
univariate normal, we have уу ~ Ni(uiL| J, ыт tend hence the mgf of yı, taking 
[ oii y? 
tı as the parameter for the mgf, is My, (й) = епш 7+1, Now, let us consider the 
joint mgf of у; and y» taking f, and f» as the respective parameters. Let the joint mgf be 
denoted by My, у, (t1, t2). Then, 


My, y, (fi, t) = E[t] = gei ED 


2 
Oo 

= e +2L5)J+ 4 (Li L2) (t LitL2) 

m 
E ei CL) HE GLA Lit t3L5L24+2t 2 L2) 
2 / 

otttaL L 

= My (t1)My,(t2)e"! VER 


Hence, the last factor on the right-hand side has to vanish for y; and y» to be independently 
distributed, and this can happen if and only if L} L2 = L5L; = 0 since г and t» are 
arbitrary. Thus, we have the following result: 

Theorem 2.3.1. Let xj, ..., x, be iid Ni (m3, o1). Let yy = LX and y2 = ІХ where 
X! = (x... Xn), L = (a1, ..., an) and L} = (b1,..., by), the aj's and bj's being 
scalar constants. Then, yy and y» are independently distributed if and only if L'!L5 = 
1,1Іл = 0. 

Example 2.3.1. Let x1, x», x3, xa be a simple random sample of size 4 from a real normal 
population № (и = 1, o? = 2). Consider the following statistics: (1): u1, v1, w1, (2): 
u2, V2, W2. Check for independence of various statistics in (1): and (2): where 


" 1 

Hp cens erase pap 0j = 2x1 — 3x2 + x3 + x4, W1 = X1 — X2 + X3 — X4; 
" 1 

HL одос nas V2 = X, — X2 + X3 — X4, W2 = Ху — X2 — X3 + X4. 


Solution 2.3.1. Let X’ = (x1, хо, хз, x4) and let the coefficient vectors in (1) be denoted 
by Lı, L2, L3 and those in (2) be denoted by М, M2, M3. Thus they are as follows : 
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, L3 = 


1 1 
1 =| / 1 / / 

DU Lo 1 = [12 = 2 LiL3 = 0, 513 = 5. 

1 —1 

This means that иј апа шу are independently distributed and that the other pairs are not 
independently distributed. The coefficient vectors in (2) are 


1 1 1 
1|1 —1 —1 
М = = 1 , М = 1 ‚ Мз = 1 = ММ = 0, M; M; = 0, ММ; = 0. 


1 ES 1 


This means that all the pairs are independently distributed, that is, u2, v2 апа w2 are mu- 
tually independently distributed. 


We can extend Theorem 2.3.1 to sets of linear functions. Let Y; = AX and Y? = BX 
where A of dimension m, x n, m, < n and B of dimension m» x n, m2 < n are constant 
matrices and X’ = (x1,..., x4) where the x;’s are iid Nj (1, ат) Let the parameter 
vectors T, and 7> be of dimensions m, x 1 and m» х 1, respectively. Then, the mgf of Y, 
is My (Ту) = E [е^ YI=E [е^ ^1X]. which can be evaluated by integration over the joint 
density of x1, ..., Xn, individually, or over the vector X’ = (x1,..., Xn) with E[X'] = 
[ш1, M1, ..., pl = uill, 1, — 1] = wJ’, J’ = [1,..., 1] => E[Y,] = ш1А1Ј. Тһе 
mgf of Y; is then 


Мү (Tı) = Е[е TAs +TAlx-EQ))) = E[e MA +t AZ] Z-X-E(X), (i) 


and the exponent in the expected value, not containing u1, simplifies to 


1 1 
522122 — 207T/ A\Z} = 5200 — оТ А1) (2 — о1А Тр) — of Тү ALA} Ti}. 
1 1 


Integration over Z or individually over the elements of Z, that is, z1, ..., Zn, yields 1 since 
the total probability is 1, which leaves the factor not containing Z. Thus, 


/ ly / 
My, (Т) = pipe T Ann (ii) 


and similarly, 
/ lp! A 
My, (I5) Se 195272024559. (iii) 
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The joint mgf of Y; and Y» is then 


My, ү, (Ті Т) = eA (T{A1J+T3A2J)+3(T{A1+T3A2)(T[A1+T3A2)' 
, , 


= My, (T) Му, (D5)eT 45h. (iv) 


Accordingly, Yı and Y? will be independent if and only if A; A5 = О = A24! = О since 
Tı and T» are arbitrary parameter vectors, the two null matrices having different orders. 
Then, we have 


Theorem 2.3.2. Let ү = A,X and Ү = АХ, with X' = (x1, ..., Xn), the xj's being 
iid Nı (u1, о), j =1,...,n, be two sets of linear forms where A, is тү x n and Аэ 
is m? x n, тү < n, m < n, are constant matrices. Then, Үү and Ү are independently 
distributed if and only if A, A5 = О or АА, = О. 


Example 2.3.2. Consider a simple random sample of size 4 from a real scalar normal 
population № (ші = 0, бт = 4). Let X’ = (x1, x2, xa, x4). Verify whether the sets of 
linear functions U = A,X, V = A2X, W = A3X are pairwise independent, where 


1 2 3 4 
А = |. E a] ^ 2-1 1 3 ‚з= [1 mE il 
1 2-1-2 


Solution 2.3.2. Taking the products, we have A1A5 # О, АЈА, = O, A24} £z О. 
Hence, the pair U and W are independently distributed and other pairs are not. 


We can apply Theorems 2.3.1 and 2.3.2 to prove several results involving sample statis- 
tics. For instance, let x1, ..., Xn be iid Ал (u1, o?) or a simple random sample of size n 
from a real № (u1, o?) and x = He Tr xà). Consider the vectors 


1 


X1 Hı 
X2 H1 


*I 


НЕНИН! 


Note that when the x;’s are iid № (u1, оў), ре Мү(О, 02), and that since X — X — 
(X — и) — (X — и), we may take x ;’s as coming from Nj (0, of) for all operations involving 
(X, X). Moreover, X = IJ'x, J'— (1, L,..., 1) where J 15 ап х1 vector of unities. Then, 
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* * X] – 1ЈХ 
| хо = 17 E 
Х-Х = – |: | = | = (I e Pe (i) 
= Ы п 
m 4 Xn — Ix 
and on letting A — 1J J’, we have 
А = А?, Т— А =(1— А)”, А(1— А) = О. (її) 
Also note that 
= Е п 1 п 
X—X)(X-X)= ; — X) and s* = — pear jii 
( yt ) 2-6) x)? and s 222 X) (iii) 


where s? is the sample variance and iJ 'X = x is the sample mean. Now, observe that in 
light of Theorem 2.3.2, Yı = (1 — A) X and Y? = AX are independently distributed, which 
implies that X — X and X are independently distributed. But X contains only x — Her + 
хь) and hence X — X and X are independently distributed. We now will make use of the 
following result: If w; and w» are independently distributed real scalar random variables, 
then the pairs (w1, w2), (шт, w2), (ш?, w2) are independently distributed when w; and 
w» are real scalar random variables; the converses need not be true. For example, ш? апа 
ш2 being independently distributed need not imply the independence of ші апа w2. If и 
and w» are real vectors or matrices and if шу and w2 are independently distributed then 
the following pairs are also independently distributed wherever the quantities are defined: 
(w1, w2w5), (шт, шушо), (Wiw), w2), (шуш, w2), (wiwi, шушо). It then follows from 
(iii) that x and (X — XY (X — X) = DANE = X)? are independently distributed. Hence, 
the following result: 


Theorem 2.3.3. Let xj, ..., Xn be iid Ni (44, o?) or a simple random sample of size n 


from a univariate real normal population № (и, o1). Let x — HET coe xà) be the 
n 


sample mean and s? = У". (x; — X)? be the sample variance. Then х and s? are 
р п ў=1%*/ р 


independently distributed. 


This result has several corollaries. When x,,..., Xn are lid № (рші, оў), then the sam- 
ple sum of products, which is also referred to as the corrected sample sum of products 
(corrected in the sense that x is subtracted), is given by 


" | 11...11 
0-9? —XU-AX,A--|P Poo id 
ji ELE м E 
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where both A and J — A are idempotent. In this case, tr(A) = ig +---+1) = 1 and 
tr(/. — A) = n — 1 and hence, the ranks of A and J — A are 1 and n — 1, respectively. 
When a matrix is idempotent, its eigenvalues are either zero or one, the number of ones 
corresponding to its rank. As has already been pointed out, when X and X are involved, 
it can be equivalently assumed that the sample is coming from a № (0, of) population. 
Hence 

ns? Dx _\2 2 

—;—9—5 Oy x^. (2.3.2) 

01 От л 
is а real chisquare with n — 1 (the rank of the idempotent matrix of the quadratic form) 
degrees of freedom as per Theorem 2.2.1. Observe that when the sample comes from a 


2 25 
real № (u1, o?) distribution, we have x ^ № (ш, 21) so that z = мар ~ №(0, 1) or 


(n—1)s? 2 2 Уу) 
zis areal standard normal, and that ош ~ X,., where зү = =~. Recall that 
X and s? are independently distributed. Hence, the ratio 
^v tn-1 
51/01 
has a real Student-t distribution with n — 1 degrees of freedom, where z = s ROC) and 


91 


Sp LC =H), Hence, we have the following result: 


Theorem 2.3.4. Let x1, ..., x, be tid Ny (4, 61). Let x — iQ + --- + Xn) and s? = 
Djaj- 
=~. Then, 


п—1 
мп(х — ш) " 


51 


tn—1 (2.3.3) 


where 1һ—1 is a real Student-t with n — 1 degrees of freedom. 


It should also be noted that when x1, ..., Xn are па № (u1, оў), Шеп 
2 =\2 2 
Via Oj — ш) 2 25 3 08 — X) me /n(x — нл)? 2 
2 Хп» 2 Xn-1> — 2 А, 
oj on 0i 


wherefrom the following decomposition is obtained: 
ld Ire А _ 
50-м)? = EDDA =) pna = ii | — х2 = орх (234) 
1 j=1 | j=l 


the two chisquare random variables on the right-hand side of the last equation being inde- 
pendently distributed. 
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2.3a. Simple Random Samples from a Complex Gaussian Population 


The definition of a simple random sample from any population remains the same as 
in the real case. A set of complex scalar random variables x1, ...,X,, which are iid as 
Ni(f1, o?) is called a simple random sample from this complex Gaussian population. Let 
X be the n x 1 vector whose components are these sample variables, x = Hei Tec XQ) 
denote the sample average, and Х' = (X, isa X) be the 1 x n vector of sample means; then 


X, X and the sample sum of products matrix 5 are respectively, 


ES 
STI 


X-2|:|,X-2|:| ad $2 (X —  - 3. 


24 
E: 
su 


These quantities can be simplified as follows with the help of the vector of unities J’ = 
(,1,...,D:% = ЛХ, X — Š = [I - 4d 1X, X = 4d IX, 5 = ХИ — AX. 
Consider the Hermitian form 


* 1 E i а Е 
XU- -JJ = У -Daji ng? 
п 
j=l 


where #7 is the sample variance in the complex scalar case, given a simple random sample 
of size n. 

Consider the linear forms йу = px = aX d asjX, and йә = LiX = bx) + 
-++ + Б.Х, where the a;’s and b;’s are scalar constants that may be real or complex, а; 
and b; denoting the complex conjugates of a; and b;, respectively. 


E[X] = (All, 1,..., 11 = (DJ, /'=[1,...,1], 
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since the x;'s are iid Ni (fa, 07), j = L...,n. The mgf of пу and i, denoted by 
Mj, (tj), j = 1,2 and the joint теѓ of iij and ii», denoted by М, 5, (t1, Ё›) are the fol- 
lowing, where 3R(-) denotes the real part of (-): 


Mi, (f) = E[e 620] = Eje” 6: 130 


2 
UI a (7: Y V kr s gs 2. 
— MAL ger -ЕФ)) AULD FHL Lit (i) 
үт * бї м. * z 
= (ДАГ J) + t Г Lot m 
Mj, (t2) —e (2115 2 ) 45-45-20 (ii) 
rM 
mu Ps ey st L* Lot m 
Mii, iy (th, t2) = Mi, (t) Ma, (t)e? 1120, (iii) 


Consequently, йу and й» are independently distributed if and only if the exponential part 
is 1 or equivalently tř LT Lot; = 0. Since f, and ñ are arbitrary, this means L¥L2 = 0 > 
1511 = 0. Then we have the following result: 


Theorem 2.3a.1. Let xj,..., X, bea simple random sample of size n from a univariate 
complex Gaussian population № (21, 82). Consider the linear forms йу = Tix and йә = 
15Х where Li, L5 and X are the previously defined n x 1 vectors, and a star denotes the 
conjugate transpose. Then, i and ii? are independently distributed if and only if L} Lz = 
0. 


Example 2.3а.1. Let x;, j = 1,2, 3, 4 be iid univariate complex normal N 1001, o2). 
Consider the linear forms 


ü;— L*X = (1+ X1 + 2195 — (1 — Яз + 2X4 
fig = LEX = (1+ 0%, + (2+ 30) — (0 — i) 3 — i4 
йз = L3X = —(1 + 0%) +19 + (1 — 03 + X4. 
Verify whether the three linear forms are pairwise independent. 
Solution 2.3a.1. With the usual notations, the coefficient vectors are as follows: 
7 = [l +i, 2i, 1-452] > Li = 1 – i, 22i, -1— i, 2] 
э = [1 +i,2+3i,1—i,—i] > L} =[1—1,2—31,1+Е1,ї] 
3 = [-(1 +i), i, 1 — i, 1] > 15 =[-(1 — i), —i, 1 +i, 1]. 
Taking the products we have 2712 = 6 + 6i £0, LĪ L3 = 0, 1513 = 3 — 31 # 0. Hence, 
only й and из are independently distributed. 


We can extend the result stated in Theorem 2.3a.1 to sets of linear forms. Let U ї = 
A,X and U2 = АХ where A, and A» are constant matrices that may or may not be in 
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the complex domain, A, is mı x n and A» is m2 x n, with m, < n, m» < n. As was 
previously the case, Х' = Сре Ху, J = host, are iid Mi (a, o1). Let Tı and 
T» be parameter vectors of orders m, x 1 and m» x 1, respectively. Then, on following the 
steps leading to (iii), the mgf of Ü; and Ü» and their joint mgf are obtained as follows: 


~ ~ ~ 02 ~ ~ 
Mg (fy) = E[e 74:0] = ea Tr Ai DTI AAT, (iv) 
1 
- 02 E "T 
Mg, (1) — ea T7 A2J) 4 T7 AL AST) (v) 
Mg, о.) = Mg, (Ë) Mo, (Te? Urea. (vi) 


Since T, and T» are arbitrary, the exponential part in (vi) is 1 if and only if Aj Až = О or 
A5A1 = О, the two null matrices having different orders. Then, we have: 


Theorem 2.3a.2. Let the Х ;’s and X be as defined in Theorem 2.3a.1. Let Ау beam, xn 
constant matrix and A» be а тә x n constant matrix, тү < п, то < n, and the constant 
matrices may or may not be in the complex domain. Consider the general linear forms 
Ü c= = AiX and Ü = AoX. Then U | and U> are independently distributed if and only if 
A145 = О or, equivalently, A3AT = О. 


Example 2.3а.2. Let x;, j = 1, 2, 3, 4, be iid univariate complex Gaussian N 1(Д1, er 
Consider the following sets of linear forms Ü, = АХ : 05 = AoX ; U3 = АзХ with 
X' = (x1, x2, x3, х4), Where 


A, = |2+3i 2+3i 2+3 243i 
D | ee Se -A+ 1+ 


A» = 1—1 1—1 —l+i -l1+i 9 дү —147 2 


е TE p б 1-3 | 
Аз = 
atan 142i -(0142i) 142i 


Verify whether the pairs in 071, U2, U3 are independently distributed. 


Solution 2.3a.2. Since the products A; A5 = О, A, A; z 0. A24} Æ О, only Ü, and 
U» are independently distributed. 


As a corollary of Theorem 2.3a.2, one has that the sample mean X and the sample 
sum of products s are also independently distributed in the complex Gaussian case, a 
result parallel to the ашы one in the real case. This can be seen by taking A; = 
1JJ' and A2 = I — 1J. Then, since A, = At, A2 = A2, and A;A2 = О, we have 
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zXAX ~ Xi for à = 0 and – L X*ASX 92 


complex domain are independently distributed. Then, 


5. > and both of these chisquares in the 


1 
32/5 28 EG -6XyG;—-XÀ) XL. (2.3a.1) 
1 OI ge 1 


The Student-t with n — 1 degrees of freedom can be defined in terms of the standardized 
sample mean and sample variance in the complex case. 
2.3.1. Noncentral chisquare having л degrees of freedom in the real domain 
Let x; ~ Ni (lj, 01), j = L...,nand the xj's be independently distributed. Then, 
eds Ер 
x ~ №(0, 1) and Ха с ~ x where xs is a real chisquare with n degrees of 
2 
freedom. Then, when at least one of the у ;’s is nonzero, pem 2 is referred to as a real 
оф: 


поп- бе chisquare with n degrees of freedom and non-centrality parameter X, which is 
denoted x; ? (X), where 


" o? 0... 0 
ENE ; ШЕ, о 0 
и H jh x и, H = : ? and 2; = * : 
pen 0 0 .. o2 
х? 
Letu = } 5 j=1 21m order to derive the distribution of u, let us determine its mgf. Since u 
is a function of the PP where x; ~ Ni (kj, оў), j =1,...,n, we can integrate out over 


the joint density of the x;’s. Then, with f as the теѓ parameter, 


M,(t) = Efe] 


QE а 
% (2л) xp PIE 


The exponent, excluding -4 can be simplified as follows: 


MN xS Gi - uy 11 Шух) E 
AY AES ay ay aa 
$297 j= O; Oo 97 - Н 


J j=l oJ j=l 


2 
I —ш]) 
-45i j a 


o - 


i ахул... лах. 


а 
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Let y; = /G 20)х;. Then, (1 — 27) ?2dyi A... A dy, = dxi A... A dx, and 


n 2 n 
X5 шух; 
dex yes [o 
Jg vp ee 


n 


ONSXP QNS Шуу _ 
сз 25 I 


е jaj у=) 
п п 2 
Xm - ac 
sp N= tern 20 
= (e Ex) Y и; 
zi | 2 
= 0j e (1 — 2t) 

But ^ 
oo oo 1 ae ! o = ) 

J а SUE аул... лау, = 1 

= Ј-ә Qm 
m n иј — 1,/ул—1 
Hence, for A = ЭЭН — = WLU, 
j 
1 =й абз 
MO) = —у[е 9] (2.3.5) 
(1—2)? 
_ cta 1 
© ! 5k 

ar! k! (1—2)? 


However, (1—21) (2+0 is the mef of a real scalar gamma with parameters (о = 5+k, В = 
2) or a real chisquare with n + 2k degrees of freedom or x2 424: Hence, the density of a 
non-central chisquare with n degrees of freedom and non-centrality parameter A, denoted 
by gy,,(u), is obtained by term by term inversion as follow: 


со n+2k 


AK —A 5—1 = 
вли) = У ск с> E (2.3.6) 
k=0 > 22 Г(5+К®) 
noi И 00 „Ку „k 
А 
СЕЕ Em. (2.3.7) 


23r(b €x M Gk 


where (7 )к is the Pochhammer symbol given by 


(a)k = a(a 4-1) --- (a 4-k — 1), a 40, (a)o being equal to 1, (2.3.8) 
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and, in general, (a + k) = Г(о)(о)к for = 1,2,..., whenever the gamma functions 
are defined. Hence, provided (1 — 2t) > 0, (2.3.6) can be looked upon as a weighted sum 
of chisquare densities whose weights are Poisson distributed, that is, (2.3.6) is a Poisson 
mixture of chisquare densities. As well, we can view (2.3.7) as a chisquare density having 
n degrees of freedom appended with a Bessel series. In general, a Bessel series is of the 
form 


ВУ 0, —1, —2, o (2.3.9) 


which is convergent for all x. 


2.3.1.1. Mean value and variance, real central and non-central chisquare 


The mgf of a real x is (1 — 21)72, 1 — 2t > 0. Thus, 
2 d y Vv e esl 
Elxg] = d 7207512 = (- ;) C720 720757 2o = v 


d y 
EUG? = 01 20) 010 = (- 7) -2(7 5 - 1) 72 = vo +d». 


That is, 
E[x2] = v and Var(x2) = v(v + 2) – V? = 2v. (2.3.10) 


What are then the mean and the variance of a real non-central x2(A)? They can be derived 
either from the mgf or from the density. Making use of the density, we have 


О jk4.—A роо ГЕ а ЕЕГ. 
À 2 2 

Е[х20)1 = 31 = | ии, 
ro K Jo ОПО) 


the integral part being equal to 


DG ++ 1) 22+! 
PR +k) 23H 


=2(5 +k) =v +2k. 


Now, the remaining summation over k can be looked upon as the expected value of v + 2k 
in a Poisson distribution. In this case, we can write the expected values as the expected 
value of a conditional expectation: E[u] = E[E(u|k)], и = x? (A). Thus, 

ie 


k! 


E[x20)] 2 0+2 Ук = v +2E[k] = v + 24. 
k=0 
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Moreover, 
v 


Bray? УЗА ^ d 
кыре MEE rcd 


the integral part being 


v 5--k4-2 
EIS Uu =2(;+Е+1)(;+Ю 
= (v + 2k + 2)(v + 2k) = v? + 2vk + 2v(k + 1) + 4k(k + 1). 
Since E[k] = А, E[k?] = å? + А for a Poisson distribution, 
Е[хг 0]? = v? + 2v + 4va + AQ? + 22). 
Thus, 
Var(x; 0)) = E[x; QJ — LEGS A)? 


= v? + 2v + 4và +407 + 21) — (> + 2)? 
= 2v + 8А. 


To summarize, 
E[x2(a)] = v + 24 and Var(x2(A)) = 2v + 8А. (2.3.11) 
Example 2.3.3. Let x1 ^ Ni(—1,2), xo ~ Nı(1, 3) and хз ^ Nı(—2, 2) be indepen- 
2 2 2 
dently distributed and u = 2: + 2 + >. Provide explicit expressions for the density of и, 
E[u] and Var(u). 


Solution 2.3.3. This и has a noncentral chisquare distribution with non-centrality pa- 
rameter A where 


еа е2 a»? rg 
COUs? e ват 254.2 3 On ak as 


and the number of degrees of freedom is n = 3 = v. Thus, u ~ xz (A) or a real noncentral 
chisquare with v — 3 degrees of freedom and non-centrality parameter na Then E[u] — 
E[x20)] = v+2a = 3+2(17) = Ë. Var(u) = Var(x20)) = 2v+8A = (2)(3)+8(0) = 
22. Let ће density of и be denoted by g(u). Then 
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1 ! ою 


п ОО ka4—A k 
и? e 2 Me и 
gu) = E ) | Pm 
ara £ HG 
2 


uie Е ut 
Т зат k! (к 


and zero elsewhere. 

2.3a.1. Noncentral chisquare having л degrees of freedom in the complex domain 
Let us now consider independently Gaussian distributed variables in the complex do- 

main. Let the complex scalar variables x; ^ N (fij, 07), j =1,...,n, be independently 

distributed. Then, we have already established that Э = yo ~ pos which 

is a chisquare variable having n degrees of freedom in the complex domain. If we let 

uc a LA then this и will be said to have a noncentral chisquare distribution with 


n degrees of endum and non-centrality parameter à in the complex domain where х} is 
only the conjugate since it is a scalar quantity. Since, in this case, й is real, we may asso- 
ciate the mgf of и with a real parameter t. Now, proceeding as in the real case, we obtain 
the mgf of п, denoted by М; (t), as follows: 


n ie, Жз 
M;() = Ele^] = (1 —)~"e +, 1-1 > 0, А = х, n = en 
j=l о) 
оо ak 
= > ar gyre, (2.3a.2) 
k=0 


Note that the inverse corresponding to (1 — t) - "^ is a chisquare density in the complex 
domain with n + k degrees of freedom, and that part of the density, denoted by f|(u), is 


= 1 п+К—1„—и |. 1 
Teeny” € жне е 


Thus, the noncentral chisquare density with п degrees of freedom in the complex domain, 
that is, и = X2(A), denoted by f, (и), is 


n-lyjke-u 


п—1 oo 


u M AK = uk 
fa (U) = ro do ee (2.34.3) 


which, referring to Eqs. (2.3.5)-(2.3.9) in connection with a non-central chisquare in the 
real domain, can also be represented in various ways. 
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Example 2.3a.3. Let ў ~ Ni(1 +i,2), Xo ~ №М(2 +i,4) and X4 ~ Ni (1 — i, 2) be 
independently distributed univariate complex Gaussian random variables and 


EI EI VEL ET ELT VEL 
ХХІ X4X2 X4 X3 = X1XI х X2 4 X3 X3 


2 2 2 
о} 05 03 2 4 2 


и = 


Compute Е[й] and Var(u) and provide an explicit representation of the density of й. 


Solution 2.3a.3. In this case, й has a noncentral chisquare distribution with degrees of 
freedom v = n = 3 and non-centrality parameter A given by 
E а pito P. Bis [0-07], IQ? (071 , ta + Cb?) 5 13 


- + + =14+=41=—. 
of оў о? 2 4 2 97 4 


A 


The density, denoted by е (и), is given in (i). In this case и will be a real gamma with the 
parameters (o = п, В = 1) to which a Poisson series is appended: 


oo AK xu тет 
= 5 3 ‚0 < : ) 
&1(и) 2. ae Fab X u «oo (i) 


and zero elsewhere. Then, the expected value of u and the variance of u are available from 
(i) by direct integration. 


Pe X ak ados 
E[u] = du = d À 
[u] / 0068 2 rat+h 


But = = п + К and the summation over k can be taken as the expected value of 


n +k їп a Poisson distribution. Thus, E[X20)] = n + E[k] = n + А. Now, in the expected 
value of 

[3; ОЛЯ ОО = EX OT? = ш], 
which is real in this case, the integral part over u gives 


кг = (n+k+1)\(n+k)=n°+2nk+k?+n+k 

with expected value n? + 2nd +n + à + (A? + А). Hence, 

Var(X2(A)) = E[u — E(u)][u — E(u)]* = Var(u) = n?+2nk+n+a+(a2+a)—(n+a)’, 
which simplifies to n + 2A. Accordingly, 


E[X2(4)] = п + à and Var(2()) = n + 2А, (ii) 
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so that 
Paes a eS (iii) 
4 4 2 2 
The explicit form of the density is then 


oo 


u^e-" у` (13/4)*е—13/4 uk 
k! (3)x 


81(и) = ‚ О<м<оо, (iv) 


k=0 


and zero elsewhere. 


Exercises 2.3 


2.3.1. Let x1, ..., x, be iid variables with common density a real gamma density with 
the parameters о and В or with the mgf (1 — t) "^,1— Bt > 0, а > 0,6 > 0. Let 


ui mop а U2 m In dex) из = из — OB, иа = YE 


and thereby the densities of u1, u2, из, ид. Show that they are all gamma densities for all 
finite n, may be relocated. Show that when n — оо, u4 — № (0, 1) or u4 goes to a real 
standard normal when n goes to infinity. 


. Evaluate the mgfs 


2.3.2. Let x1, ..., x, be a simple random sample of size n from a real population with 
mean value u and variance с? < oo, с > 0. Then the central limit theorem says that 


xD. — N4(0, 1) as n — оо, where x = HET +.---+x,). Translate this statement for 


(1): binomial probability function fı (х) = (^) pü-py7,x-0,1...,0«p«l 
and fi(x) = 0 elsewhere; (2): negative binomial probability law fo(x) = y Е | ) pea — 
py, x =k,k+1,...,0 < р < 1 and zero elsewhere; (3): geometric probability law 
В(х) = p(1— px = 1,2,...,0 < p < Land f3(x) = 0 elsewhere; (4): Poisson 
probability law f4(x) = Ae, x = 0), 1,...,А > 0 апа f4(x) = 0 elsewhere. 

2.3.3. Repeat Exercise 2.3.2 if the population is (1): g1(x) = сүхҮ—!е—а*%, x > 0,8 > 
0,a > 0, y > Oand g(x) = 0 elsewhere; (2): The real pathway model g2(x) = cgx” [1 — 
a(l—9)x"1™4, a > 0,8>0,1—а(1 — q)x? > О апа for the cases q < 1,4 > 1,9 > 1, 
апа g2(x) = 0 elsewhere. 


2.3.4. Let x ~ Nı (u, оў), y ^ №(иә, 02) be real Gaussian and be independently dis- 
tributed. Let x1, ..., хп, Y1, +--+» Уп be simple random samples from х and y respectively. 
Let ui = У — u1), из = У 10у - ua), из = 2x1 — 3x2 + yi — уз + 253, 
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у бу — fai 2 hber 
U4 = о — SO 
У Oj uo; У 10у 392/07 
Compute the densities of u1, U2, из, U4, us. 


2.3.5. In Exercise 2.3.4 if of = os = о? compute the densities of из, ил, из there, and 
ив = Y — х)? + 2003 = yy if (1): ny = m, (2): ni z пэ. 

2.3.6. For the noncentral chisquare in the complex case, discussed in (2.3a.5) evaluate the 
mean value and the variance. 


2.3.7. For the complex case, starting with the mgf, derive the noncentral chisquare density 
and show that it agrees with that given in (2.3a.3). 


2.3.8. Give the detailed proofs of the independence of linear forms and sets of linear 
forms in the complex Gaussian case. 


2.4. Distributions of Products and Ratios and Connection to Fractional Calculus 


Distributions of products and ratios of real scalar random variables are connected to 
numerous topics including Krátzel integrals and transforms, reaction-rate probability inte- 
grals in nuclear reaction-rate theory, the inverse Gaussian distribution, integrals occurring 
in fractional calculus, Kobayashi integrals and Bayesian structures. Let x; > О and x2 > 0 
be real scalar positive random variables that are independently distributed with density 
functions ѓу (ху) and f2(x2), respectively. We respectively denote the product and ratio of 
these variables by u2 = x1x2 and u; = A What are then the densities of иј and и? We 
first consider the density of the product. Let u2 = x,x2 and v = x2. Then x; = = апа 
x2 = v, dxı л іх = tdu ^ dv. Let the joint density of u» and v be denoted by g(u2, v) 
and the marginal density of u2 by g2(u2). Then 


1 1 
О) е л(22) fa(v) and go(u2) = J = AB) fi(v)dv. (2.4.1) 
v v vU v 


For example, let fı and f? be generalized gamma densities, in which case 


ex 
~ 


J 
a. j 
J f 


i, a; > 0, 0; >0, yy > 0, x; 2 0 J = 1,2, 


—ajx 


TET 
х?) 


Лоу) = rab j 


e 


and f;j(x;) = 0 elsewhere. Then, 
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g2(u2) = 2 (-) (2 =: 


Yj 
2 gl 
gap og (“2\8 

xe a)v'2—a,(— 14у, с= [|5 2 
ju p 
1 [2 8 ШЕ; 
“и | 2— 1 

=cu} | y27i-lg-ai?-ai [v gy, (2.4.2) 
0 


The integral in (2.4.2) is connected to several topics. For 6; = 1, 52 = 1, this integral is the 
basic Kratzel integral and Kratzel transform, see Mathai and Haubold (2020). When 62 = 
1, ôi = І, the integral in (2.4.2) is ће basic reaction-rate probability integral in nuclear 
reaction-rate theory, see Mathai and Haubold (1988). For 6; = 1,462 = 1, the integrand 
in (2.4.2), once normalized, is the inverse Gaussian density for appropriate values of y2 — 
y1 — 1. Observe that (2.4.2) is also connected to the Bayesian structure of unconditional 
densities if the conditional and marginal densities belong to generalized gamma family of 
densities. When ô2 = 1, the integral is a mgf of the remaining part with a» as the теѓ 
parameter (It is therefore the Laplace transform of the remaining part of the function). 


Now, let us consider different fj and f2. Let / (х1) be a real type-1 beta density with 
the parameters (у + 1, о), R(o) > 0, N(y) > —1 (in statistical problems, the parameters 
are real but in this case the results hold as well for complex parameters; accordingly, the 
conditions are stated for complex parameters), that is, the density of x, is 


Г(у+1+в) 4 


1—х)#7!,0<ху<1, 0, al, 
r4 Dro i V - СЕЕ 


fiii) = 


апа ў (ху) = 0 elsewhere. Let (х2) = f(x2) where f is an arbitrary density. Then, the 
density of u2 is given by 


Е u2NY eS . Г(у+1+@) 
e — enc Ё 2) (1-2) fm. ва 
Y 
=с—2 077—4 (v — ug)! f (v)dv (2.4.3) 
Teo) v>u2>0 
= ску, (2.4.4) 
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where K, 5. y f is the Erdélyi-Kober fractional integral of order œ of the second kind, 
wiih parameter y in the real scalar variable case. Hence, if f is an arbitrary density, then 

Ky us ў f is тулы &2(и2) ог a constant multiple of the density of a product of inde- 
pendently distributed real scalar positive random variables where one of them has a real 
type-1 beta density and the other has an arbitrary density. When fı and f» are densities, 


then g5(u2) has the structure 


взш) = |: a ie —)»@)д. (i) 


Whether or not f; and f» are densities, the structure in (i) is called the Mellin convolution 
of a product in the sense that if we take the Mellin transform of g2, with Mellin parameter 
5, then 


Mg, (5) = Му (з)Му,(з) (2.4.5) 
where My, (s) = K us ' g2(u2)du2, 


оо 


М, (s) =f xt! fiGa)dx; and Мр (s) =] xj ! fa(x2)dx2, 
0 0 


whenever the Mellin transforms exist. Here (2.4.5) is the Mellin convolution of a prod- 
uct property. In statistical terms, when fı and f» are densities and when x, and x2 are 
independently distributed, we have 


Е[и5 |] = E[x] Ep] (2.4.6) 


whenever the expected values exist. Taking different forms of fı and f2, where f; has a 
factor п for O < xı < 1, (0) > 0, it can be shown that the structure appearing 
in (1) produces all the various fractional integrals of the second kind of order o available 
in the literature for the real scalar variable case, such as the Riemann-Liouville fractional 
integral, Weyl fractional integral, etc. Connections of distributions of products and ratios 
to fractional integrals were established in a series of papers which appeared in Linear 
Algebra and its Applications, see Mathai (2013, 2014, 2015). 

Now, let us consider the density of a ratio. Again, let x; > 0, x2 > 0 be independently 
distributed real pies random variables with density functions (ху) and f2(x2), respec- 
tively. Let uy = 22 and let v = x». Then dx; A dx? = zz dui ^ dv. If we take x1 — v, 


the Jacobian will be only v and not = and the final шшс е will be different. However, 


uy 
the first transformation is required in order to establish connections to fractional integral 
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of the first kind. If fı and fz are generalized gamma densities as described earlier and if 
xı = v, then the marginal density of u1, denoted by g1(u1), will be as follows: 


ASTU 
B — —1 _—a; vôl —az (u1 v)?2 e: ej 
i(u1)) = c | ve (uiy) eT Tat) gy с = = 
ии | ae HL ras 
j=l j 
= = cur S yr 16—410 a (uv)? qi (2.4.7) 
v=0 
c JA 
= r(^ 3 "ү oz Va айу , ford; = 82 = ô. (2.4.8) 
On the other hand, if x2 = v, then the Jacobian is — 5 and the marginal density, again 


denoted by gı (u1), will be as follows when fı and pa are gamma densities: 


=f А : 
giu) = ef (5)(-)" ydg Gy 1—а20 m 
v u? uj 


-yi-1 iH фуз 1 72202 —ai (2-91 
= си! 01727 е “u dv. 
v=0 


This is one of the representations of the density of a product discussed earlier, which is 
also connected to Kratzel integral, reaction-rate probability integral, and so on. Now, let 
us consider a type-1 beta density for x; with the parameters (у, œ) having the following 
density: 

I'(y +a) 2 


ES 0—1 
TOF) Ya Edi x) , 0<x <l, 


ЛО) = 


for y > 0, œ > Oand ХА (ху) = 0 elsewhere. Let fo(x2) = f (x2) where f is an arbitrary 
density. Letting иј = с, апа x2 = v, the density of u1, again denoted by е (и), is 


81(и1) = ао ^) (1 = Ly" будь 


TOTO) Т 
_ Г(у t a) ит?“ "EE 
=—Sign foe Hr HE Жо. (2.4.9) 
DTO tO к» f Ra) >o, "A 


Г(у) Куму 
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where Ky " j f is Erdélyi-Kober fractional integral of the first kind of order о and param- 
eter y. If f| and f» are densities, this Erdélyi-Kober fractional integral of the first kind is 


a constant multiple of the density of a ratio g1(u1). In statistical terms, 


uy = - => Е[и 1] — Ep ТЕЧ with Ae = E[ 09-5 E 
1 
Mg (5) = M;,(2—s)Mp,(s), or 


which is the Mellin convolution of a ratio. Whether or not fı and f? are densities, (2.4.11) 
is taken as the Mellin convolution of a ratio and it cannot be given statistical interpretations 


o—1 
when f; and № are not densities. For example, let fi(xi) = xj CU and f2(x2) = 
x5 f (x2) where f(x) is an arbitrary function. Then the Mellin convolution of a ratio, as 


in (2.4.11), again denoted by gj (u1), is given by 


(uy = v 
g1(u1) = J ————— f(v)dv, (a) > 0. (2.4.12) 
0<и1 Г(а) 


This is Riemann-Liouville fractional integral of the first kind of order a if v is bounded 
below; when v is not bounded below, then it is Weyl fractional integral of the first kind of 
order a. An introduction to fractional calculus is presented in Mathai and Haubold (2018). 
The densities of ит and uz are connected to various problems in different areas for different 
functions fı and fo. 

In the p x p matrix case in the complex domain, we will assume that the matrix is 
Hermitian positive definite. Note that when p — 1, Hermitian positive definite means a 
real positive variable. Hence in the scalar case, we will not discuss ratios and products in 
the complex domain since densities must be real-valued functions. 


Exercises 2.4 


2.4.1. Derive the density of (1): a real non-central Е, where ће numerator chisquare is 
non-central and the denominator chisquare is central, (2): a real doubly non-central F 
where both the chisquares are non-central with non-centrality parameters A, and A» re- 
spectively. 


2.4.2. Let x, and x2 be real gamma random variables with parameters (o, 8) and (a2, В) 
X1 


with the same £ respectively and be independently distributed. Let иј = qe = 


zi из = ху + x2. Compute the densities of u1, u2, из. Hint: Use the transformation x; = 


r cos? 0, X2 =r sin? Ө. 
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2.4.3. Let x, and x» be as defined as in Exercise 2.4.2. Let и = x1x2. Derive the density 
of u. 


2.4.4. Let x; have a real type-1 beta density with the parameters (0 ;, Bj), j = 1,2 and 
be independently distributed. Let иј = x1x2, из = = Derive the densities of иј and u2. 
State the conditions under which these densities reduce to simpler known densities. 


2.4.5. Evaluate (1): Weyl fractional integral of the second kind of order о if the arbitrary 
function is f (v) = e "; (ii) Riemann-Liouville fractional integral of the first kind of order 
о if the lower limit is 0 and the arbitrary function is f (v) = vê. 


2.4.6. In Exercise 2.4.2 show that (1): u; and us are independently distributed; (2): u» 
and из are independently distributed. 


E(x") 
Eau and state the 
EQ?) 


EOX) 


x1 +x2 


h 
2.4.7. In Exercise 2.4.2 show that for arbitrary h, E | M | = 


conditions for the existence of the moments. [Observe that, in general, (у = 


even if y, and у» are independently distributed. ] 


2.4.8. Derive the corresponding densities in Exercise 2.4.1 for the complex domain by 
taking the chisquares in the complex domain. 


2.4.9. Extend the results in Exercise 2.4.2 to the complex domain by taking chisquare 
variables in the complex domain instead of gamma variables. 


2.5. General Structures 
2.5.1. Product of real scalar gamma variables 


Let x1, ..., хк be independently distributed real scalar gamma random variables with 


j-l — x. 
xj having the density /;(х;) = суху! е, 0 < xj < оо, aj > 0, Bj > O and 


fj(xj) = 0 elsewhere. Consider the product и = x4x2--- xy. Such structures appear 
in many situations such as geometrical probability problems when we consider gamma 
distributed random points, see Mathai (1999). How can we determine the density of such 
a general structure? The transformation of variables technique is not a feasible procedure 
in this case. Since the x;'s are positive, we may determine the Mellin transforms of the 
xj's with parameter s. Then, when f;(x;) is a density, the Mellin transform M fj (s), once 
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expressed in terms of an expected value, is М бб) = Е [x5 !] whenever the expected 
value exists: 


My, (s) = E[x 1] | [Cap tetas 
J J B; Г(В)) 0 J J 
Plays = 1) a4 
= ——— ————p* ,8(a;j —1 0. 
Г) В; 1(0; + 5 ) > 
Hence, 
5—1 - $—1 : Г (е) ques 1) $—1 . 
Epé7!] — | | Ele ере 3 ‚ o; s—1) > 0, f=1,...,k, 
j=l j=l J 


and the density of и is available from the inverse Mellin transform. If е (и) is the density 
of u, then 


с+іоо k | M 
giu) = = П Det? D grt е i2/CD). 0.5.0) 


This is a contour integral where c is any real number such that c > —9t(o; — 1), j = 
1,...,k. The integral in (2.5.1) is available in terms of a known special function, namely 
Meijer's G-function. The G-function can be defined as follows: 


ЕЕЕ m,n т; m,n CAERE ap 
сосе |) | 


1 / П. r6; 9HIDEz ГО — aj – s) 
1, P sus Г(1- р; – S)HI Da D'(aj +s)} 


z ds, i = J/(—1). 


(2.5.2) 


2лі 


The existence conditions, different possible contours L, as well as properties апа appli- 
cations are discussed in Mathai (1993), Mathai and Saxena (1973, 1978), and Mathai et 
al. (2010). With the help of (2.5.2), we may now express (2.5.1) as follows in terms of a 
G-function: 


k 
1 k,0 u 
muc | T] бо, ГЕ =o | (2.5.3) 
are 0,k Bios @=Ы)=1 я k 
for 0 < и < oo. Series and computable forms of a general G-function are provided 


in Mathai (1993). They are built-in functions in the symbolic computational packages 
Mathematica and MAPLE. 
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2.5.2. Product of real scalar type-1 beta variables 


Let yj, ..., ук be independently distributed real scalar type-1 beta random variables 
with the parameters (wj, Bj), aj > 0, В; > 0, j= 1,..., К. Consider the product u; = 
y1 +++ yg. Such a structure occurs in several contexts. It appears for instance in geometrical 
probability problems in connection with type-1 beta distributed random points. As well, 
when testing certain hypotheses on the parameters of one or more multivariate normal 
populations, the resulting likelihood ratio criteria, also known as A-criteria, or one- to-one 
functions thereof, have the structure of a product of independently distributed real type-1 
beta variables under the null hypothesis. The density of и can be obtained by proceeding 
as in the previous section. Since the moment of иј of order s — 1 is 


_ ЖУСТ О В) con 
T(o) Г(а; +В; +5 1) С 


for (0; -- s — 1) > 0, j= 1,..., k, Then, the density of и, denoted by g1(u1), is given 
by 


1 c+ioo i 
gi(u1) = xl [E(u] )]u; ds 
27i c—ioo 


k k 


Е Г(а; + Bj) 1 ctioo Pe; +s = 1) = 
= UT Г (о) loni J. Ul Г(о3 +В; +5 1) | ds 
k | | | 
EL ar PY abe pum] 0.55) 
ј=1 


forO < u1 € 1, Ræ; 5-1) — 0, j=1,...,k. 
2.5.3. Product of real scalar type-2 beta variables 


Let uz = zjz2::: zy where the z;'s are independently distributed real scalar type-2 
beta random variables with the parameters (о ;, Bj), aj > 0, Bj > 0, j = 1,..., k. Such 
products are encountered in several situations, including certain problems in geometrical 
probability that are discussed in Mathai (1999). Then, 
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pity = Г@ ts - DIG 50 


and 


k k k 

Еш 1 - | | zz; = {] [repren H] го; 9 s- 0ге; -s+ 10]. 
j=l j=l j=l 

(2.5.6) 


Hence, the density of u2, denoted by g2(u2), is given by 


k ыг k 
1 C+100 
82(u2) = | reren; f Mri +s- 070 -s + 1)}и;°д 
j=l C—I1OQ j=l 
: is Jb 
= (I Irepraéportt ert [up]. us >0. (2.5.7) 
j=l 


2.5.4. General products and ratios 
Let us consider a structure of the following form: 
Hof 
uz = ————— 
Il Ík, 
where the t;’s are independently distributed real positive variables, such as real type-1 
beta, real type-2 beta, and real gamma variables, where the expected values E D^] for 
j =1,...,k, will produce various types of gamma products, some containing +s and 
others, —s, both in the numerator and in the denominator. Accordingly, we obtain a general 


structure such as that appearing in (2.5.2), and the density of из, denoted by 23(u3), will 
then be proportional to a general G-function. 


2.5.5. The H-function 


Let u = vj,v2--- vy where the v;’s are independently distributed generalized real 
gamma variables with densities 


kaos = о 0 
iO) = ram" е , vj 2 0, 
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fora; > 0, 8; > 0, yj > 0, and h;(vj) = 0 elsewhere for j = 1,...,k. Then, 


yjts-l 
ppc 
; ГО) Uk 


Ew = | П AD H П > an p«"] (2.5.8) 


"A (2.5.9) 


foru > 0, R(yvj+s—1) > 0, j = 1,..., К, where L is a suitable contour and the general 
H-function is defined as follows: 


— m,n mE m,n (а1,01),..., (ар,ар) 
H(z) = Н C) = Hy, [z (bi, Br) sess 8) 


| | П 765; + BHI Г@ — aj — a5] 
LU Sua P — 5; — Bis) HT T sua Гб) + оу) 


where aj > 0, j =1,...,p; Bj > 0, j = 1,..., 9 are real and positive, b;’s and aj's 
are complex numbers, the contour L separates the poles of (bj + js), j = 1,...,m, 
lying on one side of it and the poles of Г(1 — а; — ajs), j = l,...,n, which must 
lie on the other side. The existence conditions and the various types of possible contours 
are discussed in Mathai and Saxena (1978) and Mathai et al. (2010). Observe that we 
can consider arbitrary powers of the variables present in и, u1, u2 and us as introduced in 
Sects. 2.5.1—2.5.5; however, in this case, the densities of these various structures will be 
expressible in terms of H-functions rather than G-functions. In the G-function format as 
defined in (2.5.2), the complex variable s has +1 as its coefficients, whereas the coeffi- 
cients of s in the H-function, that is, -Eo;, a; > 0 and £j, В; > 0, are not restricted to 
unities. 


z ds (2.5.10) 


2лі 
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We will give a simple illustrative example that requires the evaluation of an inverse 
Mellin transform. Let f(x) =e *, x > 0. Then, the Mellin transform is 


оо 
Му (з) =i x* le^*dx = I (s), R(s) > 0, 
0 
and it follows from the inversion formula that 
1 с+іоо 
Р(х) = =] Г (ѕ)х ds, R(s) > 0, i = y (1). (2.5.11) 
Jl Jc—ioo 


If f (x) is unknown and we are told that the Mellin transform of a certain function is Г (s), 
then are we going to retrieve f(x) as e * from the inversion formula? Let us explore this 
problem. The poles of I (s) occur at s = 0, —1, —2, .... Thus, if we take c in the contour 
of integration as c > 0, this contour will enclose all the poles of Г (s). We may now apply 
Cauchy's residue theorem. By definition, the residue at s = —v, denoted by R,, is 


R, = lim (++ v)I'(s)x *. 
p-> Áp 


We cannot substitute s = —v to obtain the limit in this case. However, noting that 


(s +)Г(5)х * = б +)06+»—1)---5Г()х* _ Pev IDa 
yee ree (scv—D--s 


which follows from the recursive relationship, о Г (о) = Г (œ + 1), the limit can be taken: 


| А Г +0 + 0) х7° 
1 Г $ = l 
im (s +v) (s) х UB (p= Tes 
Fx ACD 


COAT (2.5.12) 


Hence, the sum of the residues is 


(—1)”х” EN 
Ya = e, 


v=0 


and the function is recovered. 


Note 2.5.1. Distributions of products and ratios of random variables in the complex do- 
main could as well be worked out. However, since they may not necessarily have practical 
applications, they will not be discussed herein. Certain product and ratio distributions for 
variables in the complex domain which reduce to real variables, such as a chisquare in the 
complex domain, have already been previously discussed. 
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Exercises 2.5 


2.5.1. Evaluate the density of u = x1x2 where the x;’s are independently distributed real 
type-1 beta random variables with the parameters (0 ;, В;), 0; > 0,8; > 0,7 = 1,2 
by using Mellin and inverse Mellin transform technique. Evaluate the density for the case 
a, —a2 # +v, v = 0,1, ... so that the poles are simple. 


2.5.2. Repeat Exercise 2.5.1 if x;’s are (1): real type-2 beta random variables with param- 
eters (œj, Ву), o; > 0, В; > 0 and (2): real gamma random variables with the parameters 
(oj, Bj), 0} > 0, Bj > 0, j = 1, 2. 


2.5.3. Letu = = where иј and из are real positive random variables. Then the h-th 
h 

moment, for arbitrary A, is E E z di in general. Give two examples where E [Jr = 
2 

E[u^] 

E[u^]' 


2.5.4. ЕГ]? = E[u-"| £ AW in general. Give two examples where E[1] E TE 


2.5.5. Letu = {Z where the x;'s are independently distributed. Let x1, x be type-1 
beta random variables, x2 be a type-2 beta random variable, and x4 be a gamma random 
variable with parameters (oj, 8;), œ; > 0, 8; > 0, j = 1, 2, 3, 4. Determine the density 


of u. 
2.6. A Collection of Random Variables 


Let x1,..., Xn be iid (independently and identically distributed) real scalar random 
variables with a common density denoted by f (x), that is, assume that the sample comes 
from the population that is specified by f(x). Let the common mean value be u and the 
common variance be o? < oo, that is, E(xj) = wand Var(x;) = o?, j=l,...,n, where 
E denotes the expected value. Denoting the sample average by x = 1x t+-::+x,), what 
can be said about x when n — oo? This is the type of questions that will be investigated 
in this section. 


2.6.1. Chebyshev's inequality 


For some k > 0, let us examine the probability content of |x — u| where и = E(x) 
and the variance of x is o? < oo. Consider the probability that the random variable x lies 
outside the interval и — ko < x < u + ko, that is k times the standard deviation с away 
from the mean value u. From the definition of the variance o? for а real scalar random 
variable x, 
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qs J (x = ш)? f(x)dx 


u—ko u+ko со 
= J (x — ш)? f(x)dx + J (x = w? f (x)dx + J (x — ш)? f(x)dx 


—0o p—ko u+ko 


u—ko оо 
> J (x = ш)? f(x)dx + J (x — m? f (x)dx 


u+ko 


since the probability content over the interval u — ko < x < u + Ко is omitted. Over this 
interval, the probability is either positive or zero, and hence the inequality. However, the 
intervals (—оо, u—ko ] and [u+ko, оо), that is, оо < x < u—ko and u+ko < x < оо 
ог —oo < x — u < —ko and Ко < x — u < оо, can thus be described as the intervals 
for which |x — u| > ko. In these intervals, the smallest value that |x — u| can take on is 
kø, k > О or equivalently, the smallest value that |x — |? can assume is (ko)? = K?o?. 
Accordingly, the above inequality can be further sharpened as follows: 


o? > J (x — и)? f (x)dx > J (Ко)? f (x)dx > 
x—pu|zko Ix-u|z ko 


ы > | f(x)dx = 
x-u 


>ko 


LN J Tode = Pieu ee tho 
х—и 


>ko 


pote еко 


which can be written as 


1 1 
Pr{|x — u| > ko) < po Pr{|x — u| < ko} > 1 15 (2.6.1) 


If ko = Ку, k = EI and the above inequalities can be written as follows: 


2 c? 


Рх = Ш> < mo Prix — и] «  £ 1-55. (2.6.2) 


The inequalities (2.6.1) and (2.6.2) are known as Chebyshev's inequalities (also referred 
to as Chebycheff's inequalities). For example, when k — 2, Chebyshev's inequality states 
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that Pr{|x — u| < 20) > 1— 1 — 0.75, which is not a very sharp probability limit. If 
x ^ Мү(и, 02), then we know that 


Pr{|x — u| < 1.960} ~ 0.95 and Pr{|x — u| < 3o] ~ 0.99. 


Note that the bound 0.75 resulting from Chebyshev's inequality seriously underestimate 
the actual probability for a Gaussian variable x. However, what is astonishing about this 
inequality, is that the given probability bound holds for any distribution, whether it be 
continuous, discrete or mixed. Sharper bounds can of course be obtained for the probability 
content of the interval [u — ko, u + ko] when the exact distribution of x is known. 
These inequalities can be expressed in terms of generalized moments. Let u} = (E|x— 


1 = 5 
nr, r= 1,2,... , which happens to be a measure of scatter in x from the mean value 
ш. Given that 


Ш: =) lx — pl’ f dx, 


0,0) 


1 
consider the probability content of the intervals specified by |x — u| > ku; fork > 0. 
Paralleling the derivations of (2.6.1) and (2.6.2), we have 


1 
Ur > J 1 |x = pl! f(x)dx > J 1 Ikur) РОо)ах => 
х= ш> u; х= ш> ur 
1 1 1 1 
Prix — ш > kur} < nY Pr{|x — ш <k ai= (2.6.3) 
which can also be written as 


Prix — pl = 0 < 77 or Prix = ш] <&}>1— =, r=1,2,.... 0.64) 


Note that when r = 2, u, = o?, and Chebyshev’s inequalities as specified in (2.6.1) 
and (2.6.2) are obtained from (2.6.3) and (2.6.4), respectively. If x is a real scalar positive 
random variables with f(x) = 0 for x < 0, we can then obtain similar inequalities in 
terms of the first moment u. For k > 0, 


u = E(x) = EROS since f(x) = О forx <0 
0 


k oo oo оо 
-f ходах | xf (x)dx > | xf (x)dx > | kf(x)dx => 
0 k k k 


E > [с f (x)dx = Pr{x > К}. 
k k 
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Accordingly, we have the following inequality for any real positive random variable x: 


Prix = k) < © forx > 0,k > 0. (2.6.5) 


Suppose that our variable is x = iQ +.--+x,), where x1, ..., Xn are iid variables 


with common mean value u and the common variance o? 


and E(x) = u, Chebyshev's inequality states that 


А чы 2 
< оо. Then, since Var(x) = 2 


2 
Йй = ш <k}>1——— > Тави > oo (2.6.6) 
n 


or Pr{|x — u| > К} —> О as п — оо. However, since a probability cannot be greater than 
І, Pr{|x — u| < К} — las п — оо. In other words, x tends to u with probability 1 as 
n — co. This is referred to as the Weak Law of Large Numbers. 


The Weak Law of Large Numbers 


Let x1, ..., Xn be iid with common mean value u and common variance o? < oo. 
Then, as и — oo, 

Pr{x > ш} = 1. (2.6.7) 

Another limiting property is known as the Central Limit Theorem. Let x1, ..., Xn be iid 


real scalar random variables with common mean value и. and common variance o? < оо. 


Letting x = HET + -+ - + xn) denote the sample mean, the standardized sample mean is 


1 . 
= "a "tu Ew E Ез ба (i) 


Consider the characteristic function of x — u, that is, 


it it)? 
фы) = El" 79] = 14+ о) + E Ea- wt 
4 E | 1 ic 2 
== EG гш Slt OO OT. OO Or (ii) 


where $“)(0) is the r-th derivative of ф(ї) with respect to t, evaluated at г = 0. Let us 
consider the characteristic function of our standardized sample mean и. 
Making use of the last representation of и in (i), we have 9" aj- (t) = 


суп 


_). It then 
n 


o 


[6-С д! so that du(t) = loru GI" or ngut) = nings- 
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follows from (ii) that 


t Е i^y P E(x; — и)? 
LARES us 2!no2 3! (a.f/n)3 
12 1 T 


2 3 
Now noting that In(1 — y) = —[y + 5 +4 +---] whenever |y| < 1, we have 


iy eee ы “ о( 1) ning (5 E MOL) 2 
пф; = ning,,— = > asn — oo. 
TP at 2n n3 p At 2 n2 2 
Consequently, as n — oo, 
12 

u(t) =e 7 > u- N,(0, 1) aan > оо. 
This is known as the central limit theorem. 
The Central Limit Theorem. Let x1, ..., Xn be iid real scalar random variables having 


common mean value u and common variance о? < oo. Let the sample mean be x = 


1 (х1 T: + xn) and u denote the standardized sample mean. Then 


x — E(x) Jn 
и = == 
A/ Var(x) с 
Generalizations, extensions апа more rigorous statements of this theorem are available 


in the literature. We have focussed on the substance of the result, assuming that a simple 
random sample is available and that the variance of the population is finite. 


(x — ш) > №(0, 1) аз п > со. (2.6.8) 


Exercises 2.6 


2.6.1. For a binomial random variable with the probability function f(x) = i р“ 
x 


(1 — py’ *,0< p<1,x =0,1,...,n,n = 1,2,... апа zero elsewhere, show that the 
: : : E | х—пр 
standardized binomial variable itself, namely Japp 8098 to the standard normal when 


n — oo. 


2.6.2. State the central limit theorem for the following real scalar populations by evaluat- 
ing the mean value and variance there, assuming that a simple random sample is available: 
(1) Poisson random variable with parameter Ал; (2) Geometric random variable with pa- 
rameter p; (3) Negative binomial random variable with parameters (p, k); (4) Discrete 
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hypergeometric probability law with parameters (a, b, n); (5) Uniform density over [a, b]; 
(6) Exponential density with parameter 0; (7) Gamma density with the parameters (о, p); 
(8) Type-1 beta random variable with the parameters (o, В); (9) Type-2 beta random vari- 
able with the parameters (о, В). 


2.6.3. State the central limit theorem for the following probability/density functions: (1): 


0.5,x 22 
x)= 
FO) tee =5 
and f(x) = 0 elsewhere; (2): f(x) = 2e ?*. 0 < x < oo and zero elsewhere; (3): 
f(x) = LO < x x 1 and zero elsewhere. Assume that a simple random sample is 
available from each population. 


2.6.4. Consider a real scalar gamma random variable x with the parameters (o, 8) and 
show that E(x) = of and variance of x is 032. Assume a simple random sample 
X1, ..., Xn from this population. Derive the densities of (1): x1 +--+ + xn; (2): x; (3): 
x — af; (4): Standardized sample mean x. Show that the densities in all these cases are 
still gamma densities, may be relocated, for all finite values of n however large n may be. 


2.6.5. Consider the density f(x) — е 1 € x « oo and zero elsewhere, where c is 
the normalizing constant. Evaluate c stating the relevant conditions. State the central limit 
theorem for this population, stating the relevant conditions. 


2.7. Parameter Estimation: Point Estimation 


There exist several methods for estimating the parameters of a given den- 
sity/probability function, based on a simple random sample of size n (iid variables from 
the population designated by the density/probability function). The most popular methods 
of point estimation are the method of maximum likelihood and the method of moments. 


2.7.1. The method of moments and the method of maximum likelihood 


The likelihood function L(@) is the joint density/probability function of the sample 
values, at an observed sample point, x1, ..., Xn. As a function of 0, L(0), or a one-to-one 
function thereof, is maximized in order to determine the most likely value of 0 in terms 
of a function of the given sample. This estimation process is referred to as the method of 
maximum likelihood. 

Let m, — = denote the r-th integer moment of the sample, where x1, ..., Xn is 
the observed sample point, the corresponding population r-th moment being E[x"], where 
E denotes the expected value. According to the method of moments, the estimates of the 
parameters are obtained by solving m, = E[x'], r= 1,2,.... 
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For example, consider а Nj (u, с?) population with density 


l -uie-ay 
fx) = e 202 ,-O <x < оо, -œ <u «oo, o >0, (2.7.1) 
J 2710 
where u and o? are the parameters here. Let xj,...,x, be a simple random sam- 
ple from this population. Then, the joint density of xj,...,x,, denoted by L = 
Lai, LEE] Xn; и, o°), is 
2 l obia l Daeng- _, 
[vV2ro]" [V2z0]" 


Iré 1 
In L = -nIn(V2z0) — al 6 -3 n - m°], e +n). 
j=1 


(2.7.2) 


Maximizing L or In L, since L and In L are one-to-one functions, with respect to u and 
0 = o°, and solving for u and o? produces the maximum likelihood estimators (MLE's). 
An observed value of the estimator is the corresponding estimate. It follows from a basic 
result in Calculus that the extrema of L can be determined by solving the equations 


д 
—InL=0 (i) 
ди 
апа 
д j : 
—InL=0, 0 = 0^. (ii) 
00 


Equation (i) produces the solution џи = x so that x is the MLE of u. Note that x is a random 
variable and that x evaluated at a sample point or at a set of observations on x1, ..., Xn 
produces the corresponding estimate. We will denote both the estimator and estimate of 
и by f. As well, we will utilize the same abbreviation, namely, MLE for the maximum 
likelihood estimator and the corresponding estimate. Solving (ii) and substituting ji to u, 
we have 6 = ô? = 22 м — xy = s? = the sample variance as an estimate of 
0 = o?. Does the point (x, 52) correspond to a local maximum or а local minimum or 
a saddle point? Since the matrix of second order partial derivatives at the point (X, s?) is 
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negative definite, the critical point (x, 52) corresponds to а maximum. Thus, in this case, 
й = x and 6? = s? are the maximum likelihood estimators/estimates of the parameters 
ш and о?, respectively. If we were to differentiate with respect to с instead of 0 = o? 
in (ii), we would obtain the same estimators, since for any Quietenhable ео g(t), 
T g(t) =0 > BOs) =0 4 q(t) # 0. In this instance, 6 (0) = о ? and © i-o? = 0. 

For obtaining the moment “е we equate the sample integer мел to the 
corresponding population moments, that is, we let т, = E[x'], r = 1,2, two equations 
being required to estimate u and o?. Note that тү = X and m5 = m 1 x. Then, 
consider the equations 


x = E[x] = и and 227 = E[x?] 


Thus, the moment estimators/estimates of и and o7, which are й = x and 6? = 5°, 
happen to be identical to the MLE's in this case. 
Let us consider the type-1 beta population with parameters (o, 8) whose density is 
_ Ге t B) а 8—1 

ЛО) = Far d—x)’",0O<x<1,a>0, B>0, (2.7.3) 
and zero otherwise. In this case, the likelihood function contains gamma functions and the 
derivatives of gamma functions involve psi and zeta functions. Accordingly, the maximum 
likelihood approach is not very convenient here. However, we can determine moment esti- 
mates without much difficulty from (2.7.3). The first two population integer moments are 
obtained directly from a representation of the h-th moment: 


Г(о +А) Г(о + В) 


hy _ PI 
Elx |= Г(а) прата 
pa ee (2.7.4) 
ra) Г(о+Вв+1)) «+В 
О Орр (2.7.5) 
(a+ B)(a -- B 4- 1) а+В+1 


Equating the sample moments to the corresponding population moments, that is, letting 
mı = E[x] and тә = Е[х2], it follows from (2.7.4) that 


А а В 1-х " 
х= x n EE (iii) 
a+ В 
Then, from (2.7.5), we have 
1 n = n 
pM ü 3-1 = 1 „к= мат. p (iv) 
Xx а Bar 1 uA Man a1 
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The parameter В can be eliminated from (iii) and (iv), which yields an estimate of o; B 
is then obtained from (iii). Thus, the moment estimates are available from the equations, 
m, = E[x'],r = 1,2, even though these equations are nonlinear in the parameters o 
and В. The method of maximum likelihood or the method of moments can similarly yield 
parameters estimates for populations that are otherwise distributed. 


2.7.2. Bayes' estimates 


This procedure is more relevant when the parameters in a given statistical den- 
sity/probability function have their own distributions. For example, let the real scalar vari- 
able x be discrete having a binomial probability law for the fixed (given) parameter p, that 
is, let f (x|p) = Á pd — р), 0< p<1,x=0,1,...,n, n = 1, 2,..., and 


f (x|p) = 0 elsewhere be the conditional probability function. Let p have a prior type-1 
beta density with known parameters o and f, that is, let the prior density of p bec 


I 
&(р) = Tong" — р)”, О<р< 1, а> 0, B>0 


and g(p) = 0 elsewhere. Then, the joint probability function f(x, р) = f(x|p)g(p) and 
the unconditional probability function of x, denoted by / (х), is as follows: 


1 
ло) = туа 6 | к э ша. 


Г (о)Г (B) 
_ Го + В) (") Г(а +х)Г(В+п– х) 
— Г(о)г(В) x Г(« + B 4- n) ' 


Thus, the posterior density of p, given x, denoted by g;(p|x), is 


f(x, р) _ Г («+ B n) are ly 


= B+n—x-1 
fix) I'(o -- x) (B +п — x) | 


gi(plx) = — p) 


Accordingly, the expected value of p in this conditional distribution of p given x, which 
is called the posterior density of p, is known as the Bayes estimate of p: 
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= a+ pn) i a+x—1 +п—х—1 
ЕР гожа), РР (1 — р)? dp 
I'(a + B --n) I'(a 4- x - DI'(8 4- n — х) 
Г(о ++ x)'(B 4- n — x) I'(a 4- B -- n4 1) 
Г(а+х+1) Г(а+В+п) а + х 


Г(а+х) Г(о+В+п+1) «+8+т 


The prior estimate/estimator of p as obtained from the binomial distribution is = and the 
posterior estimate or the Bayes estimate of p is 


a+x 


E[p|x) = a ee 


so that 7 is revised to лыр In general, if the conditional density/probability function 
of x given 0 is f(x|@) and the prior density/probability function of 0 is g(@), then the 
posterior density of 0 is g1(0|x) and E[0|x] or the expected value of 0 in the conditional 


distribution of 0 given x is the Bayes estimate of 0. 
2.7.3. Interval estimation 


Before concluding this section, the concept of confidence intervals or interval esti- 
mation of a parameter will be briefly touched upon. For example, let x1,..., x, be iid 
М\(и, 0?) and let = 1G +-+- + Xn). Then, X ~ Ми, Z), (= u) ~ МО, - 
and z — y^ — ш) ^ N,(O, 1). Since the standard normal density N,(0, 1) is free of 
any parameter, one can select two percentiles, say, a and b, from a standard normal table 
and make a probability statement such as Pr{a < z < bj = 1 — о for every given о; for 
instance, Pr(—1.96 < z < 1.96} ~ 0.95 for à = 0.05. Let Pr[-£e <z< Ze} =l-a 
where zg is such that Pr(z > ze) = 5. The following inequalities are mathemati- 
cally equivalent and hence the probabilities associated with the corresponding intervals 
are equal: 


AS 
2g 52525 ®— у S- — и) 525 
is PURSE " o 
— Za X zo 
i ei. = с 2 ft 
e ЕТЕ (i) 
Х — Za X zou è 1 
iJn ЦЕ Яг 
Accordingly, 
Oo Oo 
Pr{u—z scq p )21-o« (ii) 
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ud E o б o hb 
Pr(x —ze— < u € X -ze—) = 1-а. (iii) 
n п 


Note that (iii) is not a usual probability statement as opposed to (ii), which is a probability 
statement on a random variable. In (iii), the interval [x — z« Ta х +29 57] is random and 
и is a constant. This can be given the interpretation that the random interval covers the 
parameter u with probability 1 — a, which means that we are 100(1 — a)% confident that 
the random interval will cover the unknown parameter u or that the random interval is an 
interval estimator and an observed value of the interval is the interval estimate of u. We 
could construct such an interval only because the distribution of z is parameter-free, which 
enabled us to make a probability statement on u. 

In general, if u = u(x1, ..., Xn, 0) is a function of the sample values and the parameter 
0 (which may be a vector of parameters) and if the distribution of u is free of all parameter, 
then such a quantity is referred to as a pivotal quantity. Since the distribution of the pivotal 
quantity is parameter-free, we can find two numbers a and b such that Pr{a < и < b) = 
1 — а for every given o. If it is possible to convert the statement a < и < b into a 
mathematically equivalent statement of the type иј < 0 < uz, so that Pr(u; < 0 < 
u2} = 1 — о for every given о, then [u1, u2] is called a 100(1 — o)96 confidence interval 
or interval estimate for Ө, иј and из being referred to as the lower confidence limit and 
the upper confidence limit, and 1 — а being called the confidence coefficient. Additional 
results on interval estimation and the construction of confidence intervals are, for instance, 
presented in Mathai and Haubold (2017b). 


Exercises 2.7 


2.7.1. Obtain the method of moments estimators for the parameters (o, 6) in a real type-2 
beta population. Assume that a simple random sample of size n is available. 


2.7.2. Obtain the estimate/estimator of the parameters by the method of moments and the 
method of maximum likelihood in the real (1): exponential population with parameter Ө, 
(2): Poisson population with parameter A. Assume that a simple random sample of size n 
is available. 


2.7.3. Let x1, ..., хп be a simple random sample of size п from a point Bernoulli popula- 
tion f2(x) = p*(1— ems x —0,1,0 < p < 1 and zero elsewhere. Obtain the MLE as 
well as moment estimator for p. [Note: These will be the same estimators for p in all the 
populations based on Bernoulli trials, such as binomial population, geometric population, 
negative binomial population]. 
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2.7.4. If possible, obtain moment estimators for the parameters of a real generalized 
gamma population, f3(x) = c xe bebe? a > 0,b > 0,8 > 0, х > O and zero else- 
where, where c is the normalizing constant. 

2.7.5. If possible, obtain the MLE of the parameters a, b in the following real uniform 


population f4(x) = Ez b > а,а € x < b and zero elsewhere. What are the MLE if 
a < x < D? What are the moment estimators in these two situations? 


2.7.6. Construct the Bayes’ estimate/estimator of the parameter A in a Poisson probability 
law if the prior density for А is a gamma density with known parameters (о, p). 


2.7.7. By selecting the appropriate pivotal quantities, construct a 95% confidence interval 
for (1): Poisson parameter A; (2): Exponential parameter 0; (3): Normal parameter o?; (4): 
Ө ina uniform density f(x) = ү, 0 x x < 0 and zero elsewhere. 
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Chapter 3 (R) 
The Multivariate Gaussian hen for 


and Related Distributions 


3.1. Introduction 


Real scalar mathematical as well as random variables will be denoted by lower-case 
letters such as x, y, z, and vector/matrix variables, whether mathematical or random, will 
be denoted by capital letters such as X, Y, Z, in the real case. Complex variables will 
be denoted with a tilde: x, y, X, Y, for instance. Constant matrices will be denoted by 
A, B, C, and so on. A tilde will be placed above constant matrices only if one wishes 
to stress the point that the matrix 1s in the complex domain. Equations will be numbered 
chapter and section-wise. Local numbering will be done subsection-wise. The determinant 
of a square matrix A will be denoted by |A| or det(A) and, in the complex case, the 
absolute value of the determinant of A will be denoted as |det(A)|. Observe that in the 
complex domain, det(A) = a + ib where a and b are real scalar quantities, and then, 
|det(A)|? = a? + 22. 


Multivariate usually refers to a collection of scalar variables. Vector/matrix variable 
situations are also of the multivariate type but, in addition, the positions of the variables 
must also be taken into account. In a function involving a matrix, one cannot permute its 
elements since each permutation will produce a different matrix. For example, 


x= E m y- B. y ole xe Е | 
X21 X22 узт У22 os Xi x22 
are all multivariate cases but the elements or the individual variables must remain at the 
set positions in the matrices. 

The definiteness of matrices will be needed in our discussion. Definiteness is defined 
and discussed only for symmetric matrices in the real domain and Hermitian matrices in 
the complex domain. Let A = A’ be a real p x p matrix and Y be a p x 1 real vector, 
Y' denoting its transpose. Consider the quadratic form Y'AY, A — A', for all possible Y 
excluding the null vector, that is, Y zz О. We say that the real quadratic form Y' AY as well 
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as the real matrix A = A’ are positive definite, which is denoted A > O, if Y'AY > 0, for 
all possible non-null Y. Letting A = A’ bea real p x p matrix, if for all real p x 1 vector 
Y 4 O, 


Y'AY > 0, A > О (positive definite) 

Y'AY > 0, A > О (positive semi-definite) (3.1.1) 
Y'AY <0, A < О (negative definite) 

Y'AY <0, A < О (negative semi-definite). 


All the matrices that do not belong to any one of the above categories are said to be 
indefinite matrices, in which case A will have both positive and negative eigenvalues. For 
example, for some Y, Y'AY may be positive and for some other values of Y, Y'AY may 
be negative. The definiteness of Hermitian matrices can be defined in a similar manner. A 
square matrix A in the complex domain is called Hermitian if A — A* where A* means 
the conjugate transpose of A. Either the conjugates of all the elements of A are taken and 
the matrix is then transposed or the matrix A is first transposed and the conjugate of each 
of its elements is then taken. © = a +ib, i = УС 1) апа a, b real scalar, then the 
conjugate of 2, conjugate being denoted by a bar, is 2 = a — ib, that is, i is replaced by 
—i. For instance, since 


i ee ee ee ee eee ШИ" 
8-2, ев 2, ; |»e-5-|[?, Б; 


В = B*, and thus the matrix B is Hermitian. In general, if Xisa p х p matrix, then, 
X can be written as X — Хү +iX2 where X, and X» are real matrices andi = y (— 1). 
And if X = X* then X = Ху +iX = X* = Хү — iX, or X; is symmetric and X» 
is skew symmetric so that all the diagonal elements of a Hermitian matrix are real. The 
definiteness of a Hermitian matrix can be defined parallel to that in the real case. Let 
A — A* be a Hermitian matrix. In the complex domain, definiteness is defined only for 
Hermitian matrices. Let Y zz О be a p x 1 non-null vector and let Y* be its conjugate 
transpose. Then, consider the Hermitian form Y*AY, A = A*. If Y*AY > 0 for all 
possible non-null Y = О, the Hermitian form Y* AY, A = A* as well as the Hermitian 
matrix A are said to be positive definite, which is denoted A > O. Letting A = A*, if for 
all non-null Y, 


Y* AY > 0, A > О (Hermitian positive definite) 

Y* AY > 0, A > О (Hermitian positive semi-definite) 

Y*AY <0, A < О (Hermitian negative definite) (3.1.2) 
Y*AY <0, A < О (Hermitian negative semi-definite), 
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and when none of the above cases applies, we have indefinite matrices or indefinite Her- 
mitian forms. 


We will also make use of properties of the square root of matrices. If we were to define 
the square root of A as B such as B* = A, there would then be several candidates for B. 
Since a multiplication of A with A is involved, A has to be a square matrix. Consider the 


following matrices 
1 0 —1 0 1 0 


-1 0 01 
m= MEE 1 


whose squares are all equal to />. Thus, there are clearly several candidates for the square 
root of this identity matrix. However, if we restrict ourselves to the class of positive def- 
inite matrices in the real domain and Hermitian positive definite matrices in the complex 


. А 1 
domain, then we can define a unique square root, denoted by A2 > О. 


For the various Jacobians used in this chapter, the reader may refer to Chap. 1, further 
details being available from Mathai (1997). 


31а. The Multivariate Gaussian Density in the Complex Domain 


Consider the complex scalar random variables x1, ..., Хр. Let x; = хуу +ixj2 where 
хӯ, худ are real andi = /(—1). Let E[xj1] = ип, Elxjo] = иу and E[x;] = 


шу + шуо = Йй}. Let the variances be as follows: Var(xj1) = oh, Var(xj2) = 075. Fora 
complex variable, the variance is defined as follows: 


Var (x у) = E|Xj — E(Xxj)llxj — EQ 
= E[(xi = nj) t i(xja — wij — Aj) — i(xj2 — 5j2)] 
= E[xji — uj + Gra — nj2))] = Var(xji) + Var(xj2) = 02, + 02, 


2 


= 0; . 


A covariance matrix associated with the p x 1 vector X= (Х1,...,Х р)! in the complex 
domain is defined as Cov(X) = E[X — E(X)][X — E(X)]|* = X with E(X) = ñ = 
(R1, ..., fp)’. Then we have 


01 O12 Olp 
c2 


X =Cov(X) = 
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where the covariance between x, and x;, two distinct elements in X, requires explanation. 
Let X, = х,у + ix,2 and X, = ху +ixXs2 where х;у, х2, X51, ху? are all real. Then, the 
covariance between x, and X; is 


Cov(%r, Xs) = E[X, — E(X.)][Xs — Е(Х;)]* = Cov[Gr1 + іх), Gast — ixs2)] 
= Cov(xy1, xs1) + Соу(х,2, х2) + i[Cov(x,2, X31) — Cov(%r1, x52) = Ors]. 


Note that none of the individual covariances on the right-hand side need be equal to each 
other. Hence, o,s need not be equal to osr. In terms of vectors, we have the following: Let 
Х=Х 1 + iX2 where Х| and X» are real vectors. The covariance matrix associated with 
X , which is denoted by Cov(X ), 15 


Cov(X) = Е@Х — E(X)I[X — Е(Х)]*) 
= E(((X1 — E(X1)) + i (X2 — E(X2)) U(X, — Е(Х\)) — iO, — Е(Х»))]) 
= Соу(Х, Ху) + Cov(X5, X5) + i[Cov(X5, X1) — Cov(X,, X2)] 
= Xyu + 2 +7120 — 2313] 
where 2» need not be equal to X21. Hence, in general, Cov(X1, X») need not be equal to 
Cov(X», X1). We will denote the whole configuration as Cov(X) — X and assume it to be 


Hermitian positive definite. We will define the p-variate Gaussian density in the complex 
domain as the following real-valued function: 


- 1 $on*x-l(yon 
X)y2 eX (X-nm lal 
POO = деце inde 


where |det(2’)| denotes the absolute value of the determinant of X. Let us verify that the 


normalizing constant is indeed = Consider the transformation Y = X -3 (X — й) 
which gives dX = [ае Z*)]2dY = |det(Z)|dY in light of (1.6a.1). Then |det(Z)] is 
canceled and the exponent becomes —Y*Y = — [|712 + --- + 13121. But 


—|Ў9 az 2 f? 2 oe» > ; ; 
ge PH df; = e SA? дуд Adyj2 = т, Yj = yji +iyj2, (i) 
Vj —00 J — 00 


which establishes the normalizing constant. Let us examine the mean value and the covari- 
ance matrix of X in the complex case. Let us utilize the same transformation, 27 -i (X —Д). 
Accordingly, 
~ ~ ~ ~ " l - 
E[X] = р + EX — 01 = ü + X? E[Y]. 
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However, 
E 1 & RR чы 
E[Y] — >Í Ye- “dy, 
лР Jy 
and the integrand has each element in Y producing an odd function whose integral con- 


verges, so that the integral over Y is null. Thus, E[X] = ji, the first parameter appearing 
in the exponent of the density (3.1a.1). Now the covariance matrix in X is the following: 


L 


Cov(X) = Е([Х — E(Š)I[Š — EGO] = E? EIY Y*]x 
We consider the integrand in E [Y Y*] and follow steps parallel to those used in the real 
case. It is a p x p matrix where the non-diagonal elements are odd functions whose in- 


tegrals converge and hence each of these elements will integrate out to zero. The first 
diagonal element in Y Y* is |j; |". Its associated integral is 


[же] Ifi Pe I roD, д... A аў, 
P и 
= | | ж a} f 191 Pe ay, 
ј=2 » 
From (i), 
-2 р - 
/ ўе: аў = л; П/ e B; qs; =P, 
i jt 


where |ji = у + ур, ў = yu t ур, i = JCD, and у, ую real. Let yy) = 
rcos0, уу; =r sinf > уц ^ йуу; =r dr ^ dé and 


I5 Be a5 = (| (eheu [7 д), (letting u = г?) 
r=0 0=0 
= ол)(5 [ ue" du) = 2m (3) =л. 


Thus the first diagonal element in Y Y* integrates out to л? and, similarly, each diagonal 
element will integrate out to 2”, which is canceled by the term л? present in the normal- 
izing constant. Hence the integral over yy? gives an identity matrix and the covariance 
matrix of X is X, the other parameter appearing in the density (3.1a.1). Hence the two 
parameters therein are the mean value vector and the covariance matrix of X. 


I 
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Example 3.1a.1. Consider the matrix 27 and the vector X with expected value E [X Ie 


as follows: 
_ 2 1+i = |X] MI BEC SEN pe =~» 
г (12, p Ru ай D] 


Show that X is Hermitian positive definite so that it can be a covariance matrix of X, 
that is, Cov(X )= У. If X has a bivariate Gaussian distribution in the complex domain; 
X ~ No(ü,X), E > О, then write down (1) the exponent in the density explicitly; (2) 
the density explicitly. 


Solution 3.1a.1. The transpose and conjugate transpose of X are 
and hence X is Hermitian. The eigenvalues of X are available from the equation 


(2 А)(3- А) – (1- (1+1) = 05 А2 – 53 +4= 0 
= (А – 4)(А – ) огл = 4, № = 1. 


Thus, the eigenvalues are positive [the eigenvalues of a Hermitian matrix will always 
be real]. This property of eigenvalues being positive, combined with the property that 2; 
is Hermitian proves that 27 is Hermitian positive definite. This can also be established 
from the leading minors of X. The leading minors are det((2)) = 2 > 0 and det(2) = 
(2)(3) — (1 — D(1 +i) = 4 > 0. Since 27 is Hermitian and its leading minors are all 
positive, 27 is positive definite. Let us evaluate the inverse by making use of the formula 
Xy! = a Cof CE ))’ where Cof( X) represents the matrix of cofactors of the elements 
in 27. [These formulae hold whether the elements in the matrix are real or complex]. That 
is, 


ries 


Al 3 —(1+i) 
4 


=i _ ef 
E 2 IE У = І. (ii) 
The exponent in a bivariate complex Gaussian density being —(Х 5977) ae xT! (X — Дд), we 
have 


= Е 1 
=( = 5-Х – д) = = Ki = (1 + 2D] — (1+ 20)] 
— (1+) [%1 — (++2)]{5%› — (2—)] 
= Пд ро @ =) br — (1 +25) 
+ 2 [%2 — (2 — 10 – 2 — 0)]}. (uit) 
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Thus, the density of the №(й, X) vector whose components can assume any complex 


value is _ _ 
е0)“ z- 1 - i) 


4л? 


where X7! is given in (ii) and the exponent, in (iii). 


f(X) = (3.1a.2) 


Exercises 3.1 


3.1.1. Construct а 2 x 2 Hermitian positive definite matrix A and write down a Hermitian 
form with this A as its matrix. 


3.1.2. Construct a 2 x 2 Hermitian matrix B where the determinant is 4, the trace 1s 5, 
and first row is 2, 1 + i. Then write down explicitly the Hermitian form X* B X. 


3.1.3. Is B in Exercise 3.1.2 positive definite? Is the Hermitian form Х*ВХ positive 
definite? Establish the results. 


3.1.4. Construct two 2 x 2 Hermitian matrices A and B such that AB = O (null), if that 
is possible. 


3.1.5. Specify the eigenvalues of the matrix B in Exercise 3.1.2, obtain a unitary matrix 
О, QQ* = I, О*О = I such that Q* BQ is diagonal and write down the canonical form 
for a Hermitian form X* BX = Ailyil? + A2lyol?. 


3.2. The Multivariate Normal or Gaussian Distribution, Real Case 


We may define a real p-variate Gaussian density via the following characterization: Let 


X1, .., Xp be real scalar variables and X be a p x 1 vector with x1, ..., хр as its elements, 
that is, X’ = (x1,..., xp). Let L' = (a1,..., ap) where a1, ...,ap are arbitrary real 
scalar constants. Consider the linear function и = L'X = X'L = ах +--+ + apXp. 
If, for all possible L, и = L'X has a real univariate Gaussian distribution, then the 


vector X is said to have a multivariate Gaussian distribution. For any linear function 
ДЕХ ВАЕ Е uc cothnN т = ү 
and Var(u) = L'EL, X = Cov(X) = E[X — E(X)][X — E(X)] in the real case. If u is 
univariate normal then its mgf, with parameter f, is the following: 


M,(t) = E[e™] = of E(w) 5 Vario) — (ЛЕ SEL 


а1,...,ар When the aj;’s are arbitrary. As well, tL contains only p parameters as, for 
example, taj is a single parameter when both г and a; are arbitrary. Then, 
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M,,(t) = My (tL) = etP'&*56D'Z0D — „Т'и+Т'®Т _ My(T), T=tL. (3.2.1) 


Thus, when L is arbitrary, the mgf of и qualifies to be the mgf of a p-vector X. The density 
corresponding to (3.2.1) is the following, when X > О: 


1 гу 
f(X) —ce 2-0 2 Qo. =O < ху < оо, —00 < ш «00, E 2 О 


for j = 1,..., p. We can evaluate the normalizing constant c when f (X) is a density, in 
which case the total integral is unity. That is, 


1 = / f(X)dX = / ce XW EN Gy. 
X X 


Let E-XX —-p)=Y>dY= |x|-3d(X —p)= |x|-?dX since jz is a constant. The 
Jacobian of the transformation may be obtained from Theorem 1.6.1. Now, 


1=с||? / e^ 3Y Y gay, 
Y 


ty _ y2 2 œ T — 
But Y Y = yit: +y5 where y1, ..., Yp are the real elements in Y and / e 2"idy; = 


м 27. Hence fy e-iYYgdy = (J/27)?. Then c = |2 (2) 2]! and the p-variate real 
Gaussian or normal density is given by 


f(X) = Lec owe (3.2.2) 
|212 (27) 2 
for X > О, =œ < ху < oo, —оо < и; < оо, j = 1,..., p. The density (3.2.2) is 


called the nonsingular normal density in the real case—nonsingular in the sense that 27 is 
nonsingular. In fact, 2 is also real positive definite in the nonsingular case. When 27 is 
singular, we have a singular normal distribution which does not have a density function. 
However, in the singular case, all the properties can be studied with the help of the asso- 
ciated теѓ which is of the form in (3.2.1), as the mgf exists whether X is nonsingular or 
singular. 


We will use the standard notation X ~ №, (и, X) to denote a p-variate real normal or 
Gaussian distribution with mean value vector jz and covariance matrix 27. If it is nonsingu- 
lar real Gaussian, we write X > O;ifitis singular normal, then we specify |X| = 0. If we 
wish to combine the singular and nonsingular cases, we write X ~ Np(u, X), X > О. 
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What are the mean value vector and the covariance matrix of a real p-Gaussian vector 
X? 


E[X] = E[X — u] + E[u] = u + / (X — p) f (X)dX 
X 


pa | Gc iei ta-nax 
|2 Q2) Ix 


X 2 1 Y'Y 1 
=u+ z | Yez dY, Y = X 20K — p). 
(2л)? JY 
The expected value of a matrix is the matrix of the expected value of every element in the 
matrix. The expected value of the component y; of Y' = (y1,..., yp) is 


E] = —— J a TT e J “bid 
jl yje ^з vjl BOTE yi} 
м 2T J —oo iżj=1 V 27 J—oo 


The product is equal to 1 and the first integrand being an odd function of y;, it is equal to 
O since integral is convergent. Thus, E[Y] = О (a null vector) and E[X] = џ, the first 
parameter appearing in the exponent of the density. Now, consider the covariance matrix 
of X. For a vector real X, 


Cov(X) = E[X — E(X)][X — Е(Х)] = E[(X — u)(X — my] 


—' | (Cy (X — Га а ЕА 
12 (2л) Jx 
= Hl YY'e? av |x*, Y — Y *X- p). 
(2л)? Ү 
But 2 
" n rp? Uc У1Ур 
| узуу Уу сс Уу?ур 
ҮҮ’ = : [yi S yp] = А Н 44 а 
Yp . . 7 P 
УрУ1 Уруз °°" Yp 


The non-diagonal elements are linear in each variable y; and yj, i 4 j and hence the inte- 
grals over the non-diagonal elements will be equal to zero due to a property of convergent 
integrals over odd functions. Hence we only need to consider the diagonal elements. When 
considering у, the integrals over y2,..., ур will give the following: 


e9 125 г = 
J e Didy; = 2л, Jed cs = (2л) 
—OooQ 
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and hence we are left with 


1 S soci 2 © o _12 
= | уте йуу = = | уге ?"dy 
—OO 


due to evenness of the integrand, the integral being convergent. Let u = yr so that y; — и? 
since y; > 0. Then dy; = 1y2- du. The integral is available as pos = тур = 
„л since Г G) = ,/z, and the constant is canceled leaving 1. This shows that each 
diagonal element integrates out to 1 and hence the integral over YY’ is the identity matrix 
after absorbing (2л) 5. Thus Cov(X) = X 258 = X the inverse of which is the other 
parameter appearing in the exponent of the density. Hence the two parameters are 


u =E[X] and X = Cov(X). (3.2.3) 
The bivariate case 


When p = 2, we obtain the bivariate real normal density from (3.2.2), which is denoted 
by f (x1, x2). Note that when p = 2, 


x2 — ua 
2 
y(u o2)_( оү оор 
= = 2 
021 022 0102р 05 J’ 


where o? = Var(x1) = о, оў = Ver(x2) = 022, 012 = Cov(xi, x2) = 0102p where p 
is the correlation between x; and x2, and p, in general, is defined as 


(Х-и) D(X – u) = (xı — ш, x2 — ш) 57! G Е "i 


C ; 
A E E 
A/ Var (x1 )Var(x2) 0102 


which means that p is defined only for non-degenerate random variables, or equivalently, 
that the probability mass of either variable should not lie at a single point. This p is a scale- 
free covariance, the covariance measuring the joint variation in (x1, x2) corresponding to 
the square of scatter, Var(x), in a real scalar random variable x. The covariance, in general, 
depends upon the units of measurements of xı and x2, whereas p is a scale-free pure 
coefficient. This о does not measure relationship between xı and x? for —1 < p < 1. 
But for o = +1 it can measure linear relationship. Oftentimes, p is misinterpreted as 
measuring any relationship between x; and x2, which is not the case as can be seen from 
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the counterexamples pointed out in Mathai and Haubold (2017). If px, is the correlation 
between two real scalar random variables x and y and if u = aix + bj and v = азу + Ро 
where a, # 0, а Æ О and bj, b» are constants, then o,,, = +px,y. It is positive when 
a, > 0, a > О ога < 0, a» < 0 and negative otherwise. Thus, p is both location and 
scale invariant. 


The determinant of 27 in the bivariate case is 


2 
oi 00102 


2.2 2 
= ofo5(1— ,-l<p<l,o, >0,m>0. 
09102 05 102\ p) p 


The inverse is as follows, taking the inverse as the transpose of the matrix of cofactors 
divided by the determinant: 


1 
yd 1 | Br P d _ 1 o2 ECT (3:24) 
ojo;0 —p)|-ecws of 1-p|-4% ш | B 
Then, 
- 2 — 2 — ЯЕ: 
(X ene — u) = (= =) + (2 =) - 2(= (2 =) zo 
0] 02 0] Oo 
(3.2.5) 
Hence, the real bivariate normal density is 
1 29. 
fGQx)- e 20-р?) (3.2.6) 


2л0102ү (1 — o?) 


where Q is given in (3.2.5). Observe that Q is a positive definite quadratic form and hence 
О > О for all X and и. We can also obtain an interesting result on the standardized 
variables of xı and хо. Let the standardized x; be y; = У, j = 1,2 апаи = у — y». 


oj d 
Then 


Var(u) = Var(y1) + Var(y2) — 2Cov (y1, yg) = 1 + 1 — 2p = 2(1 — p). (3.2.7) 


This shows that the smaller the absolute value of p is, the larger the variance of u, and 
vice versa, noting that —1 < р < 1 in the bivariate real normal case but in general, 
—] < p < 1. Observe that if о = О in the bivariate normal density given in (3.2.6), 
this joint density factorizes into the product of the marginal densities of xı and x2, which 
implies that xı and x2 are independently distributed when p = 0. In general, for real scalar 
random variables x and y, о = 0 need not imply independence; however, in the bivariate 
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normal case, о = 0 if and only if x, and x» are independently distributed. As well, the 
exponent in (3.2.6) has the following feature: 


o= amz a roe a oe (P) =e 


where c is positive describes an ellipse in two-dimensional Euclidean space, and for a 
general p, 


(Xa) Sk =p) See 0, 5 0, (3.2.9) 


describes the surface of an ellipsoid in the p-dimensional Euclidean space, observing that 
5-1 > O when X > О. 


Example 3.2.1. Let 


ХІ 1 з 0- 
A= | xe), BS) -1 gua 0 3 
X3 —2 —] 1 


Show that X > О and that X can be a covariance matrix for X. Taking E[X] = џи and 
Cov(X) — X, construct the exponent of a trivariate real Gaussian density explicitly and 
write down the density. 


Solution 3.2.1. Let us verify the definiteness of X. Note that X = X’ (symmetric). The 
e =9>0, |X| = 12 > 0, and hence X > О. 


0 3 
The matrix of cofactors of X, that is, Cof( 2) and the inverse of X are the following: 


leading minors are |(3)| = 3 > 0, 


5 -1 3 i 5 -1 3 
Со) | -1 5 —3 DELE =| 5 3 |. (i) 
з -3 9 з -3 9 


Thus the exponent of the trivariate real Gaussian density is -j Q where 


1 5. —] 3 xı— 1 
Q = 111 - 1,0 +1,53+2] — 1 5 —3 х +1 
з —3 9 хз +2 


1 
= 00e — 1? +502 + 1 + 9(x3 + 2)? — 2(x1 — 1)( + 1) 
+ 6(x1 — 1)(%3 + 2) — 6(x2 + 1)(хз + 2)). (ii) 
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The normalizing constant of the density being 
(2л) |х| = 2n)2[12]2 = 22 323, 
the resulting trivariate Gaussian density is 
f(X) = [2 323] le“ 20 
for -oo < ху < oo, j = 1, 2, 3, where О is specified in (ii). 
3.2.1. The moment generating function in the real case 


We have defined the multivariate Gaussian distribution via the following character- 
ization whose proof relies on its moment generating function: if all the possible linear 
combinations of the components of a random vector are real univariate normal, then this 
vector must follow a real multivariate Gaussian distribution. We are now looking into the 
derivation of the mgf given the density. For a parameter vector Т, with T’ = (tj, ..., tp), 
we have 


Mx(T) = Ele" *] = J ВОО ЕБОР 
x 


elu 


/ 1 Pos 
Fm / el XW HX WE" апера 
||! 2л)? Jx 


Observe that the moment generating function (mgf) in the real multivariate case is the 
expected value of e raised to a linear function of the real scalar variables. Making the 
transformation Y = 2; 3 (Х-и) = Үү = |5 |-?dX . The exponent can be simplified as 
follows: 


1 1 
ТХ-и) - (X - E X и) = -5l-2T'XiY + Y'Y) 
1 
= (Шы УЗТу(Ү — E27) — TST}. 


Hence 
Mx(T) = еТ'н+ї1Т'®Т 1 - 
Ол)? 


The integral over Y is 1 since this is the total integral of a multivariate normal density 


i L, í 
| e 20-22TYO-Z?T)y, 
Y 


whose mean value vector is XZT and covariance matrix is the identity matrix. Thus the 
mgf of a multivariate real Gaussian vector is 


My(T) = eT PtT ET. (3.2.10) 
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In the singular normal case, we can still take (3.2.10) as ће mgf for X > О (non-negative 
definite), which encompasses the singular and nonsingular cases. Then, one can study 
properties of the normal distribution whether singular or nonsingular via (3.2.10). 


We will now apply the differential operator E defined in Sect. 1.7 on the moment 
generating function of a p x 1 real normal random vector X and evaluate the result at 
T = О to obtain the mean value vector of this distribution, that is, u = E[X]. As well, 
E[X X'] is available by applying the operator aor on the mgf, and so on. From the mgf 


in (3.2.10), we have 


д д / ly 
—Mx(T)|r-o = ег "*3T YT. 


oT oT 
/ 1%; 
= [e ^*?7 77 t4 + ZT]Iz20] > и = E[X]. (i) 
Then, 
д у , 
Mx(T) = eT (3T ZT iW +T' y]. (ii) 


ƏT’ 


Remember to write the scalar quantity, Mx(T), on the left for scalar multiplication of 
matrices. Now, 


om gp Mx CO = е +Т'Х] 
= Mx(T)lu + STM! + T'E] + Му (Т)ГУ]. 
Непсе, 
EXX’) = [5 MxCDIr-o] = E + ш. (ii) 
oT oT’ B 
But 
Cov(X) = E[XX'] — EIXIE[X'] = (E + ww’) — ug! = X. (iv) 


In the multivariate real Gaussian case, we have only two parameters u and X and both of 
these are available from the above equations. In the general case, we can evaluate higher 
moments as follows: 


д д ə 
OT’ ӘТ oT’ 


E[---X’XX]=-- Mx(T)\|r=0 . (v) 
If the characteristic function $x (T), which is available from the mgf by replacing Т by 
iT, i = J(—1), is utilized, then multiply the left-hand side of (v) by i = ./(—1) with each 
operator operating on $x (Т) because фх(Т) = Mx(iT). The corresponding differential 
operators can also be developed for the complex case. 
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Given a real p-vector X ^ Ny(u, X), X > О, what will be the distribution of a 
linear function of X? Let u = L'X, X ~ Np(u, X), X > O, L' = (а\,..., ap) where 
41, ..., ар are real scalar constants. Let us examine its mgf whose argument is a real scalar 
parameter т. The mgf of и is available by integrating out over the density of X. We have 


M,,(t) = E[e'"] Z E[e/^ X] = E[e X]. 


This is of the same form as in (3.2.10) and hence, M, (t) is available from (3.2.10) by 
replacing Т” by (tL’), that is, 


yz 12 f 
M,(t) = e.) VL EL — y ~ Ni (L'u, L'EL). (3.2.11) 


This means that и is a univariate normal with mean value L'u = E[u] and the variance of 
L'XL = Var(u). Now, let us consider a set of linearly independent linear functions of X. 
Let A be areal q x p, q < p matrix of full rank q and let the linear functions U = AX 
where U is q x 1. Then E[U] = AE[X] = Ay and the covariance matrix in U 15 


Cov(U) = E[U — E(U)][U — Е(0)] = E[A(X — u)(X — и) А] 
= AE[(X — W(X — WJA = ADA’. 


Observe that since X > О, we can write X = XX so that AXA’ = (AXi)(AZi) 
and A X is of full rank which means that AXA’ > О. Therefore, letting T be aq x 1 
parameter vector, we have 

My(T) = Ele" "] = Efe’ ^^] = Efe? 1, 
which is available from (3.2.10). That is, 


My (T) — el Aut; (T/AZA'T) ay U m N; (Аи, ADA’). 


Thus U is a q-variate multivariate normal with parameters A u and A X A’ and we have the 
following result: 


Theorem 3.2.1. Let the vector random variable X have a real p-variate nonsingular 
№ (и, X) distribution and the q x p matrix A with q < p, be a full rank constant matrix. 
Then 

U=AX~N,(Ap, AZ A), AXA > О. (3.2.12) 
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Corollary 3.2.1. Let the vector random variable X have a real p-variate nonsingular 
№ (и, X) distribution and B be a 1 x p constant vector. Then U; = BX has a univariate 
normal distribution with parameters Bu and B X B'. 


Example 3.2.2. Let X, w= E[X], X = Cov(X), Y, and A be as follows: 


xi 2 4 2 0 
kalel Е] Чор е9 3 d ue 
x3 —1 0 1 2 » 


Let yy = xı + xo + x3 and y2 = xı — x2 + хз and write Y = AX. If X > O and if 
X ~ N3(u, X), derive the density of (1) Y; (2) y; directly as well as from (1). 


Solution 3.2.2. The leading minors of X are |(4)| = 4 > | Е. Е 


0, |X| = 12 > Oand X = X'. Being symmetric and positive definite, X is a bona fide 
covariance matrix. Now, Y = AX where 


[=s > 


2 2 
111 1 | 
атй=лкшш=а| || E TET (i) 
4 0 
1 
2 


20 11 
соң) = A waa = | | = JE о |Е = 1-1 al (ii) 


Since A is of full rank (rank 2) and уу and y» are linear functions of the real Gaussian 
vector X, Y has a bivariate nonsingular real Gaussian distribution with parameters E(Y) 


and Cov(Y). Since 
73] 1[ au -3 
3 11 ^ 68| —3 7 | 


the density of Y has the exponent — 1 О where 


E! 11 -3 у= 1 
о n-rs-u| 5 7 |1) 


1 
cS Or 1? +702- 1)? – 661 — 02 — D). (iii) 
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The normalizing constant being (Ол)?|5 |2 = 27/68 = 44/177, the density of Y, de- 
noted by f (Y), is given by 


1 1 
(Y) = e 22 iv) 
f 4/171 | 
where О is specified in (iii). This establishes (1). For establishing (2), we first start with the 
2 
formula. Let yı = АХ > A1 = [1,1,1], E[yi] = A1E[X] = П, 1, 1] 0 |=1 
—1 
апа 
4 —2 0 1 
Var(yi) = AiCov(X)A| =[1,1,1]| —2 3 1 7: 
0 Y. 2 1 


Hence yı ^ Nı(1, 7). For establishing this result directly, observe that y; is a linear func- 
tion of real normal variables and hence, it is univariate real normal with the parameters 
E[y,] and Var(y;). We may also obtain the marginal distribution of y, directly from the 
parameters of the joint density of y; and y2, which are given in (i) and (ii). Thus, (2) is 
also established. 


The marginal distributions can also be determined from the mgf. Let us partition T, u 
and 27 as follows: 


Ti ка) 241 M5 Х| 
== А == 3 У == 3 X == y 
Я P E. 25] Xn X2 i 


where Tj, шо), Хт arer x l and X11 isr x r. Letting T2 = О (the null vector), we have 


1 ua 1 Xu 2Л2||Т 
Т'и+ -Т'5Т SIT О']|' © |-+—[Т/, О” 
т» ag) IL) jl d X21 222)| 0 


= Тү May + ; Ln Ti, 
which is the structure of the mgf of a real Gaussian distribution with mean value vector 
E[Xi] = ш) and covariance matrix Cov(X1) = 211. Therefore X, is an r-variate real 
Gaussian vector and similarly, X is (p — r)-variate real Gaussian vector. The standard 
notation used for a p-variate normal distribution is X ~ N,(u, E), X > О, which 
includes the nonsingular and singular cases. In the nonsingular case, X > О, whereas 
|X| = 0 in the singular case. 
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From the mgf in (3.2.10) and (i) above, if we have Z1» = О with X21 = Xiz then the 


X І П . 
mgf of X — es becomes е7! Mayt+T; uy 5T, Eu Ti 5T; 22? That 1S, 
2 


Mx(T) = Mx,(TMx,Q»), 
which implies that X; and X» are independently distributed. Hence the following result: 


Theorem 3.2.2. Let the real p x 1 vector X ~ Ny(u, X), X > О, and let X be 
partitioned into subvectors X, and X», with the corresponding partitioning of u and X, 


that is, 
х (*! _ (но _(*1 Хр 
X5] UD)’ Xa 2X9 
Then, X and X» are independently distributed if and only if X1; = X5, = O. 


Observe that a covariance matrix being null need not imply independence of the sub- 
vectors; however, in the case of subvectors having a joint normal distribution, it suffices to 
have a null covariance matrix to conclude that the subvectors are independently distributed. 


3.2a. The Moment Generating Function in the Complex Case 


The determination of the теѓ in the complex case is somewhat different. Take а p- 
variate complex Gaussian Х ~ (й, X) 5 = Y* > O. Let T = (f1,.. itp) be a 
parameter vector. Let T = T| +iT>, where Т and 7 are p x 1 real vectors andi = J/(—1). 
Let X = Xı c i X» with Х| and X»? being real. Then consider T*X = = (T —iT/ 2)(X1 + 
iX2) = Т| Xi4+T; ?X2--i(Tj X5-Tj 7X1). But Ti Xi +T > X2 already contains the necessary 
number of parameters and all the котерш real variables and hence to be consistent 
with the definition of ће mgf in the real case one must take only the real part in T*X. 
Hence the mgf in the complex case, denoted by Mg(T), is defined as E [e 030]. For 
convenience, we may take X=X- iL + ju. Then Efe” T" D] = е ge *G -i)]. 
On making the transformation Fora (X — Д), |Чеї(27)| appearing in the denominator 
of the density of X is canceled due to the Jacobian of the transformation and we have 
(X — ji) = D2Y¥. Thus, 

1 


- T aD ow 
Y 


E[e 1*5] = 
лр 


For evaluating the integral in (i), we can utilize the following result which will be stated 
here as a lemma. 
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Lemma 3.2а.1. Let апа V be two p x 1 vectors in the complex domain. Then 
28(U*V) = U*V + V*U =2H(V*U). 

Proof: Let U = U, +10, V = Vi + iV» where U1, U2, Vi, № are real vectors and 

i = y(—1). Then U*V = [0 — iUj][Vi + i V2] = Ui Vi + U5V» + [О V2 — U5Vi]. 

Similarly V*U = V[U; + V3U2 + i[V{U2 — V;Ui]. Observe that since Uy, U2, Vi, V2 are 

real, we have U/V; = ViUi for all i and j. Hence, the sum U*V + V*U = 2[U1 Vi + 

U; V]=2 R(V*U). This completes the proof. 


Now, the exponent in (i) can be written as 


-" X ]- = 
K(T* SIV) = -T*E2Y + 


N 
NI = 


by using Lemma 3.2a.1, observing that X = X*. Let us expand (Y — C)*(Y — С) as 
Y*Y — Y*C — C*Y + C*C for some C. Comparing with the exponent in (i), we may take 
CS IT*x ? so that C*C — І1Т* У Т. Therefore іп the complex Gaussian case, the теѓ 
is _ 2l 

Mg(T) = е“ TET. (3.2a.1) 
Example 3.2a.1. Let X, E[X] = р, Cov(X) = EX be the following where X ~ 
№(д, X), X > О, 


- fu) -_fi-i] гз 1+ 
ger ae se P 


Compute the mgf of X explicitly. 


. - t а А ~ à + 
Solution 3.2a.1. Let T = H where let t = t1 + itjo,f2 = б + ito? with 
2 
t11, t12, 1, t22 being real scalar parameters. The mgf of X is 
M;(T) — ей“ Ә+Т* ET 


Consider the first term in the exponent of the mgf: 


~ ás ; j ie: | 
QUT* à) = mf [en — 1112, 191 — ite] , = d | 


= Hi (ta — tia) = i) + (t1 — it2)(2 — 3i)} 
= ty — 112 + 20] — 302. 
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The second term in the exponent is the following: 


lege Тра mf 3 1+) [A 
17 кї = 2, > || | 
1 Tk TZ уж уж 
= gai ti +2562 + (1 o tito + (10 —itht.}. 


Note that since the parameters are scalar quantities, the conjugate transpose means only 
the conjugate or H = fj, j = 1,2. Let us look at the non-diagonal terms. Note that 
[C + Dffio] + [A — iit] gives 2(titor + titer + йор — t1122). However, tit) = 
th, + th, Bh = 05, + th. Hence if the exponent of M (7) is denoted by ф, 


1 
$ = [ti] — t2 + 263 — 3t] + 4 Uh + 125) + 2(t3, + 125) 
+ 2(11121 + 11222 + роб — f11£22)]. (i) 


Thus the mgf is 
Mg(T) = e? 


where ф is given in (i). 


3.2a.1. Moments from the moment generating function 


We can also derive the moments from the mgf of (3.2a.1) by operating with the differ- 
ential operator of Sect. 1.7 of Chap. 1. For the complex case, the operator ЗУ in the real 


case has to be modified. Let X = X 1 + iX2 be a p x 1 vector in the complex domain 
where X; and X» are real and p x 1 and i = ./(—1). Then in the complex domain the 
differential operator is 
д д ‚ д " 
>% ЭХ; +з; (її) 
Let T = Ti -iTo, A= um t ipo, X = Xi +iX where T1, 7, wa), Мо), 24A, X2 
are all real andi = /(—1), 51 = X|, and X} = — X» because X is Hermitian. Note that 
T* ST = (Т| — iT,) Z(Ti +іТ) = Тү УТ, + Т, XT, + i(T; УТ» — 15211), and observe 
that 
T; УТ; = TiOA TET; = Т; AH +0 (iii) 


for j = 1,2 since 2» is skew symmetric. The exponent in ће mgf in (3.2a.1) can 
be simplified as follows: Letting и denote the exponent in the mgf and observing that 
[T*XT]* T* XT is real, 
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SNN == | | 1 | | 
u = R(T* pf) + д XT = R(T — iTs) (и) іу) + 401 — iT) (Ti iT?) 
1 1 
= Tinm + Tu) + АП XT, + Ту УТ] + 41 01 = i(T; ХТ — T; УТ) 
1 1 | 
= Tiu + Thea + 41071 XiTi + T; УТ] + FL (iv) 


In this last line, we have made use of the result in (iii). The following lemma will enable 
us to simplify u1. 


Lemma 3.2a.2. Let T| and T» be real p x1 vectors. Let the p x p matrix X be Hermitian, 
X = X* = Xi +iX with У = X; and X = — У). Then 


ui = (TL ET — T; STi) = —2T| У»Т» = 2T] X9 


— jn = —2X5T» and jn = 2». (v) 
Proof: 'This result will be established by making use of the following general properties: 
For a 1 x 1 matrix, the transpose is itself whereas the conjugate transpose is the conjugate 
of the same quantity. That is, (a + ib)! = a +ib, (a+ib)* = a — ib and if the conjugate 
transpose is equal to itself then the quantity is real or equivalently, if (a+ib) = (a+ib)* = 
a — ib then b — 0 and the quantity is real. Thus, 


uj, = i(T; ET) — T; ET)) = {Тү (X1 TiX)T,-— Т) (71 +iX2)Ti], 
= iT, ХТ, — Тү ХТ — iT; XAT| + T; ХТ! — —Тү5»Т» + T; XT; 
= —2T| ХТ = 2Т» УЭТ). (vi) 


The following properties were utilized: Т’ 17; = T; AAT; for all i and j since Xi is 
a symmetric matrix, the quantity is 1 x 1 and real and hence, the transpose is itself; 
T; 227; = -T 2/27; for all i and j because the quantities are 1 x 1 and then, the transpose 
is itself, but the transpose of X, = — X2. This completes the proof. 


Now, let us apply the operator Gr +i 3r) to the mgf in (3.2a.1) and determine the 
various quantities. Note that in light of results stated in Chap. 1, we have 


д д д E 
(ТУТ) 222AT,, ——(—2T, Т) = 2355, —R(T* ft) = : 
an 12A T1) iT; aT, | 122212) 27 ӘТ, (Т^) = ua) 


д д 9 ~ 
(Т) УТ) 22X4T75, — (T DT) = 257›Тү, (Т) = , 
aT 2X112) 115 y 52211) 2Ti ӘТ» (TU) = шо) 
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Thus, given (ii)-(vi), the operator applied to the exponent of ће mgf gives the following 
result: 


д д 1 
— +i—)u= j -RXT — 2225 T» 4-2 Xji T? + 23i T, 
on ting )m Hay t ino + 4l i1 2T» + 224i T? + 2551Т\] 

E | — "rm 
= DO ОБО 20 ыен = + - ХТ, 


N 


so that 


д E д д ~ 
— Mz Т 25 = (= i) Mg T T 
9T x о aT; lon rat Feo 
"FP ў " 
= [Mg(T)[A + z ZT]In-0,5-0 = В, (vii) 


N 


noting that Т = O implies that Tj = О and T» = О. For convenience, let us denote the 
operator by 


0 ( д E д ) 
= GL l v 
oT 9T| 9T) 
From (vii), we have 
? му) = Mg (Dii + SŽ] 
n M DEG E 
д В MERC ; 
zm ande = [и + 2^ X |Mg(T). 
Now, observe that 
Т*У = (T| - iT} X = ТУ -iT X > 
_9 (7*у) = У _9 (фк) —-iX 
= X, = —12/, 
Т! 9T» 


(— if i) dx) = D-i(i)D = 25, 


ӘТ 0T» 
and 
S ct MUT ee = ВИР 
pape Соин nee 
Thus, 


д В ә a _ А 
—M;(T)|¢_5 =й and М: (7)15 6 == + йй“, 
ПЕЕ ОТОР ТЕЧЕ? 
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and then Cov(X) = X. In general, for higher order moments, one would have 


ЕГ: X*XX*]= ЕС lz 
атата $9 099 
3.2a.2. Linear functions 
Let Ù = L*X where L* = (a1, ..., ap) and а\,...,аь are scalar constants, real 


or complex. Then the mgf of w can be evaluated by integrating out over the p-variate 
complex Gaussian density of X. That is, 


Ms) = E[e 69] = ge GL 3»] (3.2a.2) 
Note that this expected value is available from (3.2a.1) by replacing Т* by tL*. Hence 
Мь@) = eC +g EZ) (3.2a.3) 


Then from (2.1а.1), 0 = L* X is univariate complex Gaussian with the parameters L* ģū 

and L* XL. We now consider several such linear functions: Let Y — AX where A is 

qX p, q < p and of full rank q. The distribution of Y can be determined as follows. Since 

А is a function of Х, we can evaluate the mef of Y by integrating out over the density of 
X. Since Y is q х 1, let us take ag x 1 parameter vector U. Then, 


Mg(U) = Ее? 0*0] = gp OAD] = EMA, (3.24.4) 


On comparing this expected value with (3.2a.1), we can write down the mgf of Y as the 
following: 


~ ROT TPE Am І руж KT] arrr ~ І руж жүру 
M;(U) = e^ (U Aj) 3(U* A) Z(ATU) = eU (AR))-3U" (AXA n (3.2a.5) 


which means that Y has a q-variate complex Gaussian distribution with the parameters 
A п and AZ A*. Thus, we have the following result: 


Theorem 3.2a.1. Let X — (й, У), У > О be a p-variate nonsingular complex 
normal vector. Let A be aq x p, q < p, constant real or complex matrix of full rank q. 
Let Y = AX. Then, 


Y ~N,(Ajt, АХА"), ADA* > О. (3.2a.6) 
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Let us consider the following partitioning of T, Х , X where T is pxl, Tı isr x 1, 
r<p,X,isr x1, Xi isr xr, да) isr x 1: 


= Т\| z ИЕ Xi У Ly] -~ Ё) 
Ta Jee t,x Sala kk = =l]. 
A : : H Б 2 А [s 
r 
Let Tə = O. Then the mgf of X becomes that of X 1 as 


~ Zu 50| [т x = 
ж EE 
[T1 , О] Е 2. || = 77 Euh. 


Thus the mgf of X 1 becomes 
Mg, (Т) = етйат. (3.24.7) 


This is the mgf of the r x 1 subvector X 1 and hence x 1 has an r-variate complex Gaussian 
density with the mean value vector дт) and the covariance matrix X11. Ina real or complex 
Gaussian vector, the individual variables can be permuted among themselves with the 
corresponding permutations in the mean value vector and the covariance matrix. Hence, 
all subsets of components of X are Gaussian distributed. Thus, any set of r components 
of X is again a complex Gaussian for r = 1,2,..., p when X is a p-variate complex 
Gaussian. 
Suppose that, in ће mgf of (3.2a.1), Z1» = О where X^ (й, X), X > O and 


s Х| " Ha) Xi Xn» ы Ti 
Х = = Tsp |. 
(7): Т (49). X X» T» 
: . : " Xu OY. . 
When 2712 is null, so is £5; since 22] = 2. Then X = is block-diagonal. 
О Zo 
As well, R(T* a) = R(T | Hay) + R(T. > (2) and 


Акыл а ye [Zu О ҮТ 
Par =O. m ( о A (= = Т ЖТ TIED. (i) 


In other words, М (T) becomes the product of the the mgf of X 1 and the mgf of X 2, that 
15, Х 1 апа Х 2 are independently distributed whenever X12 = О. 


Theorem 3.28.2. Let X ~ № p У), У > О, be a nonsingular complex Gaussian 
vector. Consider the partitioning of X, Ё, Т, Xasin (i) above. Then, the subvectors Ху 
and X» are independently distributed as complex Gaussian vectors if and only if X1» = О 
or equivalently, X21 = О. 
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Exercises 3.2 


3.2.1. Construct a 2 x 2 real positive definite matrix A. Then write down a bivariate real 
Gaussian density where the covariance matrix is this A. 


3.2.2. Construct a 2 x 2 Hermitian positive definite matrix B and then construct a complex 
bivariate Gaussian density. Write the exponent and normalizing constant explicitly. 


3.2.3. Construct a 3 x 3 real positive definite matrix A. Then create a real trivariate Gaus- 
sian density with this A being the covariance matrix. Write down the exponent and the 
normalizing constant explicitly. 


3.2.4. Repeat Exercise 3.2.3 for the complex Gaussian case. 


3.2.5. Letthe px 1 real vector random variable have a p-variate real nonsingular Gaussian 
density X ^ №, (и, X), X > O.Let L bea px 1 constant vector. Letu = L'X = X'L = 
a linear function of X. Show that E[u] = L'u, Var(u) = L'XL and that и is a univariate 
Gaussian with the parameters L'u апа L' XL. 


3.2.6. Show that the mgf of u in Exercise 3.2.5 is 


А 12 / 
M, (t) = el (C ш+%1 XL 


3.2.7. What are the corresponding results in Exercises 3.2.5 and 3.2.6 for the nonsingular 
complex Gaussian case? 


3.2.8. Let X ~ N,(O, X), X > О, be areal p-variate nonsingular Gaussian vector. Let 
uy = X'XT!X, and ил = X'X. Derive the densities of u, and u2. 

3.2.9. Establish Theorem 3.2.1 by using transformation of variables [Hint: Augment the 
matrix A with a matrix B such that C = p is p x p and nonsingular. Derive the density 
of Y = CX, and therefrom, the marginal density of A X.] 


3.2.10. By constructing counter examples or otherwise, show the following: Let the 
real scalar random variables x, and x? be such that x1 ^ № (ші, оў), оу > 0, хо ~ 
№ (u2, p оз > 0 and Cov(x;, хо) = 0. Then, the joint density need not be bivariate 
normal. 


3.2.11. Generalize Exercise 3.2.10 to p-vectors X, and X». 
3.2.12. Extend Exercises 3.2.10 and 3.2.11 to the complex domain. 
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3.3. Marginal and Conditional Densities, Real Case 


Let the p x 1 vector have a real p-variate Gaussian distribution X ^ Np(u, X), X > 
O. Let X, u and X be partitioned as the following: 


" Xi iub a4 [xt zn 
X zz Ы == Я и = ВЯ , X = Е уз 
p 
where X, and uq) are r x 1, X2 and ио) are (p — r) x 1, Xllisr x r, and so on. Then 
yu yl X 
уму] 7 E Е / БА / 1— Hd) 
(Х-и) 5 (Х-и) = [1 – um), (X2 — way] Е ут E Е Д 
= (Хі — wa) Z (X1 — way) + (X2 — по) EP (X3 — uoy 
+(X — bay) EP (x; 24003) + (X5 — шоу) Z?! (Xi = Hes 
(i) 
But 
[CX1— way) D(X. — uo) = (X2 – io) E? (xi — uq) 


and both are real 1 x 1. Thus they are equal and we may write their sum as twice either 
one of them. Collecting the terms containing X2 — шо), we have 


(X2 — noy 5° (Х› — шоу) + 206 — noy Z^ Gt = nay. (ii) 
If we expand a quadratic form of the type (X2 — шо) + Cy X?(X5 — цо) + С), we have 
(X2 — шо) + CY Z7 (X; — uo + С) = (X2 — uo) E? (X5 — uq) 
+ (X2 — way 2С + C/E (Xp — ноу) + СХС. (iü) 
Comparing (ii) and (iii), let 
LC = (еше SC) x eda. 


Then, 
C'Z?C = (Xi – way) ZP(?) 1 Y? (4 — way). 


Hence, 
(X — uy Z Y (X — w) = (X1 — way) IS! – xzP(x? 1x?lYx, — way) 
+ (X2 — шо) + су X?(x;— ио) + С), 
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and after integrating out X», the balance of the exponent is (Ху — ша) Eg (Ху = ua), 
where X11 is ће r х r leading submatrix in X; the reader may refer to Sect. 1.3 for results 
on the inversion of partitioned matrices. Observe that 2; Y = D! yet 
The integral over Хэ only gives a constant and hence the marginal density of Х| is 


ESI 
Лоа) = e 201-0) Хү (Хід), 


On noting that it has the same structure as the real multivariate Gaussian density, its nor- 
malizing constant can easily be determined and the resulting density is as follows: 


fi(X1) = Pg oE o), Ej O, (3.3.1) 
|21112 Q2)? 
for Coo < xj < oo, —oo < шу < oo, j = l,...,r, and where 271 is the covariance 
matrix in X; and ип) = E[X1] and Zi; = Соу(Х). From symmetry, we obtain the 
following marginal density of X» in the real Gaussian case: 


1 1 TE 
Р(Х) = ти E» (02-00) у, > 0, (3.3.2) 
22 


for—oo < ху < 00, —00 < uj < 00, ј = г +1,..., р. 


Observe that we can permute the elements in X as we please with the correspond- 
ing permutations in и and the covariance matrix X. Hence the real Gaussian density in 
the p-variate case is a multivariate density and not a vector/matrix-variate density. From 
this property, it follows that every subset of the elements from X has a real multivariate 
Gaussian distribution and the individual variables have univariate real normal or Gaussian 
distribution. Hence our derivation of the marginal density of X, is a general density for 
a subset of r elements in X because those r elements can be brought to the first r posi- 
tions through permutations of the elements in X with the corresponding permutations in u 
and 27. 


The bivariate case 


Let us look at the explicit form of the real Gaussian density for p — 2. In the bivariate 


case, 
O11 o 
y| 92 
012 022 
For convenience, let us denote от by o? and 022 by o2. Then o1» = 0102/ where p is 
the correlation between x; and х, and for p = 2, 


011 O12 


2 
= 011022 — (012). 
012 022 


|| = отоу — (01020)? = of 05 (1 — p°). 
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Thus, in that case, 


1 1 
элй =. f(y)! = ——— 
jz eol Po | 


d cade 
AE 1 бт 0102 1 1 
Бе т ЖЖ ЖЕШ е 


2 

05 ES | 
2 

— 00102 01 


Hence, substituting these into the general expression for the real Gaussian density and 


denoting the real bivariate density as f (x1, x2), we have the following: 
1 1 

— o| — —— 0} (3.3.3) 

(27 )a102./1 — o? 2 — e?) 


where Q is the real positive definite quadratic form 
xp 2 Xe x2 — X2 — 2 
o=( 1 =) — 2p/ 1 —)( 2 =) +( 2 =) 


foro, > 0, œ > 0; —1 p < 1, =œ < xj < ©, ~ < Hj < œ, ]/=1,2. 


f, х2) = 


The conditional density of X, given X2, denoted by g1(X1|X»), is the following: 


F(X) _ _|5%} 
PA) / Qa? 


1 
x exp |- 51X — Z^ QC и) — (Х›— ny Ez GG — ау), 


gi(X1|X2) = 


We can simplify the exponent, excluding —5, as follows: 


(X — uy (X — u) - (X2 — Lo) Zz (X2 — uao) 
= (Ху = nay EZ! (X1 — may) + 206 — ua ZU (X2 — uo) 
+(X⁄2— HoV Z? (X5 =O) = (X2 = шо) Ex (Хә — uo). 
But 2 = X? — y?My15-! x7. Hence the terms containing 2722 are canceled. The 
remaining terms containing X2 — u(2) are 
2(X1 — nay EZ" (o — noy) + Q6 — мо) E? (2)! ZU (o — uo). 


Combining these two terms with (X1 — i1) E! (X1 — way) results in the quadratic form 
(X1 — way + CY X! (Xi — way + C) where С = (Z!)7! XP (X5 — uo). Now, noting 


that , 


| 1 
PH PX $ 1 
xj [i22 [Zu — 21225; Xal pcs Ea 
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the conditional density of X, given X», which is denoted by g1(X1|X»), can be expressed 
as follows: 
1 


BI e 
Qz)!|Ei — 212255 21|? 


1 = = 
x exp{—5(X1 — pay + С) (1 — 215253 D1) l(X1 — uy + C) 
(3.3.4) 


where C = (X!»-! SP (xX — ua). Hence, the conditional expectation and covariance 
of Х| given X» аге 


E[X1|X2] = way — € = way – (1) 1 ZP (x5 — we) 
= ua) + 555! (X2 — u), which is linear in X2. 
Cov(X1|X2) = Xu — 1255 X2, which is free of X». (3.3.5) 


From the inverses of partitioned matrices obtained in Sect. 1.3, we have [XE rus 


= Mp Pret which yields the representation of the conditional expectation appearing in 


Eq. (3.3.5). The matrix X12 xo is often called the matrix of regression coefficients. From 
symmetry, it follows that the conditional density of X5, given X4, denoted by g2(X25| X1), 
is given by 
1 

—r EE І 

2r) T | – 2 Dy) Spl? 
1 _ _ 
х exp |- 7X2 — шоу + Ci) (22 — У» У Хр) (X2 — шоу + с) 
(3.3.6) 


g2(X2|X1) = 


where С = (X sa ae SX 1 = H1»). and the conditional expectation and conditional 
variance of X» given X, are 


E[X2|X1] = way — Ci = що) — (BY) ! X" (t — way) 
= цо) + Xn Ly (Xi — n). which linear in Х| 
Соу(Хә|Х |) = 3» e X521 Ey; which is free of X1, (3.3.7) 


the matrix X21 X m being often called the matrix of regression coefficients. 
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What is then the conditional expectation of xı given x2 in the bivariate normal case? 
From formula (3.3.5) for p = 2, we have 


E 012 
E[X1|X2] = way + E3235 (X» — AQ) = ш + 22 — m) 
2 
0102p 0| 
E Ape (х2 — u3) = ш + 5 02 — шо) = E|xilo], (3.3.8) 
2 


which is linear in x2. The coefficient 7 p is often referred to as the regression coefficient. 
Then, from (3.3.7) we have 


PE[xo|x1] = wo + 22 ote — ші), which is linear in х] (3.3.9) 
01 


апа A p is the regression coefficient. Thus, (3.3.8) gives the best predictor of x, based 
on x2 and (3.3.9), the best predictor of x2 based on x1, both being linear in the case of a 
multivariate real normal distribution; in this case, we have a bivariate normal distribution. 


Example 3.3.1. Let X, xi, х2, x3, E[X] = u, Cov(X) = X be specified as follows 
where X ~ Na(u, X), X > O: 


X] Ш] —1 3 —2 0 
Х= |х|, и= | ш | = 0 |, 5= [2 2 1 
x3 из =2 0 1 3 


Compute (1) the marginal densities of x; and X2 = М (2) the conditional density of 
3 


xı given X» and the conditional density of X» given x1; (3) conditional expectations or 
regressions of x; on X» and X» on х]. 


Solution 3.3.1. Let us partition 27 accordingly, that is, 


(Xu Xm EN "o | | -2 _ [21 
р. 2. X11 = 011 = (3), 212 = [—2, 0], р = | | in = |. l 
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Let us compute the following quantities: 


1 3 -1 
E 
X» =: 1 3 


= 1 3 -1][-2] 3 
Zu- Z223 a =3-1-20(3)| j || |=; 


ra- marama [ен 1 
[Zi = Zi Xj 1]! = 3 
[222 — X4 Z4 Xp] = | : 1 |. 

As well, 

Binds, = [—2, o(z) | E p. | = |- А а апа yy = G) Е. | = [к | : 

Then we have the following: 


E[X1|X2] = ща + Xi Zgj (t2 — ио) 


6 2 X2 — 0 6 2 ы 

апа 3 
Cov(Xi|X2) = Zii — Zio E Da = s (ii) 

E[X5|X1] = wa) + Xp Z5 (ХІ — i) 
fo -2/3 _ [-s@1+) T 
СЕЧЕ 0 (iii) 
and 
Е 2/3 1 , 

Cov(X2X1) = En — EnZj En = | | ; | (iv) 


The distributions of x; and X» are respectively x1 ^ №(—1, 3) and X2 ^ № (шо), 222). 
the corresponding densities denoted by ў (х) and f2(X2) being 


1 1 
= -101 NV 2 _ 2 
f2(X2) = (ony 5° ret Q= 51302) 2(x2)(x3 + 2) + 2(x3 + 2)*] 
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for —oo < xj < oo, j = 2, 3. The conditional distributions аге Х| Хэ ^ N1(E(X1|X2), 
Var(X1|X2)) and X5| X1 ~ No(E(X2|X1), Cov(X25|X1)), the associated densities denoted 
by g1(X1|X2) and g2(X2|X1) being given by 


gi(X1|X2) = Pg Sart bacon 
М (277) (3/5)? 


1 _1 
g1(X5|X1) = e222, 
2л xl 


2 2 2 2 
О» =3|з + zm» —2[х» ze Des +D + оз +2) 
for -oo < ху < oo, j = 1, 2, 3. This completes the computations. 
3.3a. Conditional and Marginal Densities in the Complex Case 


Let the p x 1 ere vector X have the p-variate complex normal distribution, 
Х ~ Ñ p(s x), 5 > О. As can be seen from the corresponding теѓ which was de- 
rived in бе 3.28, v subsets of the variables x),..., Xy are again complex Gaussian 
distributed. This result can be obtained by integrating out the remaining variables from the 
p-variate complex Gaussian density. Let О = X — ji for convenience. Partition X, й, О 


into subvectors and » into submatrices as follows: 
11 yl й 1 R X _ U 
yl one uU Xs 1 Nm 1 
Е 5722 Ш йо) Š Ds 


where Xi, D, Ü, are г x land Xll is r x г. Consider 


7 УП xy? Ü, 
Ux = 05] Е r2 Ü» 
= UF TD", + 0520 + UF x” Ü + UZ DY, (i) 


and suppose that we wish to integrate out U> to obtain the marginal density of ЁЛ. The 
terms containing U> are Us 72207 + UTE 120, + Us x" 0. On expanding the Hermitian 
form 
(05 + C)* Z7 (Ü: + C) = UF 2205 + UF z?c 

+ C* 5220, + С* EC, (ii) 
for some C and comparing (i) апа (її), we may let PEU = X?C > С = 
(27)! 52101. Then C* 2c = Ut ZU(?) !y?!U, and (i) may thus be written 
as 


Üt(z!! — xz "(z?y Vy? + (0, + C Z2 (05 + €), € = (X9) x f. 
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However, from Sect. 1.3 on partitioned matrices, we have 
yl. x(x2)-1y E x 
As well, 
det(Z) = [det(Z11)][det(Z2? — Z1 211! 212)] 
= [det(Z11)][det(( 7^) !)]. 


Note that the integral of exp(- (U» --C)* Z? (U2+C)} over Üz gives x?" |det(Z22)-1| = 
zx? '|det( 355 — Pada X12)|. Hence the marginal density of Xi is 


fidt) = аре te, 2 > О. (3.3a.1) 
л” |det( 21i 


It is an r-variate complex Gaussian density. Similarly X has the (p — r)-variate complex 
Gaussian density 


= 1 "tM 
f) = eMe ee Fan ба-а, уз > О. (3.3a.2) 


Hence, the conditional density of X 1 given X 2, 1S 


Mc fü, X2) тр" |де») 
X X — = FT — 
&1(X1|X2) Ab x? |det Xl 


ей)" й) + Go Ro By) io) 


From Sect. 1.3, we have 
|det()| = |det(222)| |det(Zi1 — £12 255! 221) 


and then the normalizing constant is [л” |det(Z п = 2 ib 235))]]!. The exponential 
part reduces to the following by taking 0 = X – б, Uy = Ху — р, Uo = Xa — р): 


(X — j)*z- (X — й) – (Xo – о))* En (Xa — fia) 
= буб + OF 520 + OF EO, 
+ DX _ D _ S ES, 
= Ur Ü,-UIEU(xl)mzUUO,.-2Ü0fzP"UÜ, 
= [01 + (x!) 1 xz POS x tU, + (x1 1 x VD]. (3.34.3) 
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This exponent has the same structure as that of a complex Gaussian density with 
Е[01102] = —(Z!) ! ZU, and Cov(X1|X2) = (E)! = Zi — 25 En. 
Therefore the conditional density of X, given X» is given by 
&1(Х1|Хә) = : = e- Kit o E" Xft tO) 
л" |де — E1225? 2X21)! 
C--(Xly'!P(X,—jüo).  (83a4) 


The conditional expectation of Х| given Х2 is then 
E[X1|X2] = йа) — (27) EP GG — йо) 

= ña) + Yn Xp (X2 — оу) (linear in X2) (3.3a.5) 
which follows from a result on partitioning of matrices obtained in Sect. 1.3. The matrix 
212 I is referred to as the matrix of regression coefficients. The conditional covariance 
matrix is M : 

Cov(X1|X2) = Xu — Xn 255! Xz (free of Хэ). 
From symmetry, the conditional density of X? given X is given by 
1 

7? |det(Z») — Exi Ej Zi?) 
x e- (2-йоу+С1)* (2 fiytCr) (3.3a.6) 


£X) = 


Cy = -(Z7) XP (X, — ба), E? > о. 


Then the conditional expectation and the conditional covariance of X 2 given X 1 are the 
following: 


E[X2|X1] = бо) — (Z2)! x?! (X1 — fay) 
= IL) + БЕ (Xi — Aa) (linear in X4) (3.3a.7) 
Cov(X2|X1) = (5?) = Xx — Ej Ej Din (free of X1), 


where, in this case, the matrix 2721 X "i is referred to as the matrix of regression coeffi- 
cients. 


Example 3.3a.1. Let X, == E[X], X = Cov(X) be as follows: 


| [8 14i з 1+i 0 
X=|H|,a@=]2-i],y=]1-i 2 i 
% 3i 0 -i 3 
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Consider the partitioning 


У = [0, -i], X; = (3), X2 = (33), 


where Ху, j = 1, 2, 3 are scalar complex variables and X ~ № (й, X). Determine (1) the 
marginal densities of X 1 and X; (2) the conditional expectation of X 1 ІХ 2 or Е [X 1 ра 
апі the conditional expectation of X5| X, or Е[Х|Х 1]; (3) the conditional densities of 
Х| X» and Хә|Х|. 

Solution 3.3a.1. Note that X = 27* and hence X is Hermitian. Let us compute the 


[cadinpduitiors БЕ det ptio rade [|р | i mt 
1-i 2 


dez) = |а| 5 3D] - a o[ee([ z^ ;D]*o 
—:(3)6)530 +4) 590. 


Hence X is Hermitian positive definite. Note that the cofactor expansion for determinants 
holds whether the elements present in the determinant are real or complex. Let us compute 
the inverses of the submatrices by taking the transpose of the matrix of cofactors divided 
by the determinant. This formula applies whether the elements comprising the matrix are 
real or complex. Then 


M p з: dap cb 9 eaa 
ЕЕ: 1. . Cr. E 
zx N =|{ 2 | = a 3 i o 


Д 1 
ay Xi Z5 51 = l 2 Ed E H (;)I0. —i] 
3 1+i 1/0 0 3 1+i 

2 2 ||, Б. 5 Е a 

tae (Sap ре р ee. эф]; ы 
[Xi = 212255 Eni] -| ТЕ ; mm 3 ; (iii) 

1 E ; 

PE x | 

e е 
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As well, 


1 
Ур = = Й = HE (vi) 
1 2 -1+90]у_1 em . 
аа 3 |) = д c9 (vii) 


With these computations, all the questions can be answered. We have 


515! = [0, -i( 
_ "NM 5 1+1 3 1+i 
X1 ~ Mhao, X1), May = pa 1 ‚= l c 2 | 


апа Хэ = X4 ~ Ñ Gi, 3). Let the densities of X; and X = X4 be denoted by fı (X) and 
(Хз), respectively. Then 


~ 1 l(z ужу j 
a — 3(%3—31)" (43-37). 
Оз) = Mey 
"m 1 
‚сл 0 


1 
Q1- 40261 ОР) Gir) 
(Е) Е) 0-0-2)-(-1)09» 
= 2- i)“ E Ai) 4302-2 iD“ 06-02 =a). 


The conditional densities, denoted by 21 (X 1 ІХ 2) and 22 (X |Х 1), are the following: 


ACE s uoi 
81(X1|X2 = GG 
= | eed rd) 
0 = |56 (1i) (х і | 
Е ое ОЗ — 3i) 
ЕС ee бз — 3i))*(% — (+i) 


ри раа Оз вну PG -3| 


The Multivariate Gaussian and Related Distributions 165 


Q3 


8o(X2|X1) = aon ; 

Оз = Mc — M3)" (x3 — M3) where 

Мз = 3i + ца + i[x; — (1 +1)] 30 — (2 — i)]) 
=3i+ ца +1) — 3ix2 + 3 + 4i}. 


The bivariate complex Gaussian case 
Letting p denote the correlation between x; and x», it is seen from (3.3a.5) that for p = 2, 


"e - щу " „ор a 
E[X1|X5] = Aa) + 5525; (X2 — Ao) = ш + o2 — Д2) 


2 
" 0102p .. " " оү. = е А -— 
= Д1 + 3 (X2 — Во) = Ai + p (X2 — Дә) = E[xi|x2] (linear in x2). 
2 
(3.3a.8) 
Similarly, 

E a 02 - = : --— 

E[x2|x1] = йә + m (X1 — д1) (linear in Х|). (3.3a.9) 


Incidentally, o15/ os and o12/ o? are referred to as the regression coefficients. 


Exercises 3.3 


3.3.1. Let the real p x 1 vector X have a p-variate nonsingular normal density X ~ 
№(и, X), X > O. Letu = X' X-! X. Make use of ће mgf to derive the density of u for 
ЧП) w= О, (2) иж О. 

3.3.2. Repeat Exercise 3.3.1 for u Æ О for the complex nonsingular Gaussian case. 


3.3.3. Observing that the density coming from Exercise 3.3.1 is a noncentral chi-square 
density, coming from the real p-variate Gaussian, derive the non-central F (the numerator 
chisquare is noncentral and the denominator chisquare is central) density with т and n 
degrees of freedom and the two chisquares are independently distributed. 


3.3.4. Repeat Exercise 3.3.3 for the complex Gaussian case. 


3.3.5. Taking the density of и in Exercise 3.3.1 as a real noncentral chisquare density, 
derive the density of a real doubly noncentral F. 
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3.3.6. Repeat Exercise 3.3.5 for the corresponding complex case. 


3.3.7. Construct a 3 x 3 Hermitian positive definite matrix V. Let this be the covariance 
matrix of а 3 x 1 vector variable X. Compute V-!. Then construct a Gaussian density for 


this X. Derive the marginal joint densities of (1) XL and хо, (2) x; and x3, (3) X» and Xa, 
where X1, X2, Хз are the components of X. Take E[X] = О. 


3.3.8. In Exercise 3.3.7, compute (1) E[x1|X2], (2) the conditional joint density of X1, Хә, 
given x3. Таке E[X] = д = О. 


3.3.9. In Exercise 3.3.8, compute the mgf in the conditional space of x; given X», X5, that 
is, E[e 0 |X, x4]. 


3.3.10. In Exercise 3.3.9, compute the mgf in the marginal space of x2, x3. What is the 
connection of the results obtained in Exercises 3.3.9 and 3.3.10 with the mgf of X? 


3.4. Chisquaredness and Independence of Quadratic Forms in the Real Case 


Let the p x 1 vector X have a p-variate real Gaussian density with a null vector as its 
mean value and the identity matrix as its covariance matrix, that is, X ^ №,(О, Г), that is, 
the components of X are mutually independently distributed real scalar standard normal 
variables. Let u — X'AX, A — A' bea real quadratic form in this X. The chisquaredness 
of a quadratic form such as u has already been discussed in Chap. 2. In this section, we 
will start with such a u and then consider its generalizations. When A = A’, there exists an 
orthonormal matrix P, that is, PP’ = J, P'P = I, such that P'AP = diag(, ..., Ap) 
where Лі, ..., Ар are the eigenvalues of A. Letting У = P'X, E[Y] = P'O = О and 
Cov(Y) = P'IP = 1. But Y is a linear function of X and hence, Y is also real Gaus- 
sian distributed; thus, Y ^ N,(O, I). Then, уў er x j = 1l,..., p, or the уў аге 
independently distributed chisquares, each having one degree of freedom. Note that 


u = X'AX = Y'P'APY = Ay? ++ Ару. (3.4.1) 


We have the following result on the chisquaredness of quadratic forms in the real p-variate 
Gaussian case, which corresponds to Theorem 2.2.1. 


Theorem 3.4.1. Let the p x 1 vector be real Gaussian with the parameters u = О and 
X = 1 or X ~ №(0, 1). Letu = X'AX, A = А! be a quadratic form in this X. Then 
u = XAX ~ XZ that is, a real chisquare with r degrees of freedom, if and only if A = А? 
and the rank of A isr. 
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Proof: When A = A’, we have the representation of the quadratic form given in (3.4.1). 
When A = A?, all the eigenvalues of A аге 1’s and 0°. Then r of the А ;'ѕ are unities 
and the remaining ones are zeros and then (3.4.1) becomes the sum of r independently 
distributed real chisquares having one degree of freedom each, and hence the sum is a 
real chisquare with r degrees of freedom. For proving the second part, we will assume 
that u = Х'АХ ~ x2. Then the mgf of u is M,(t) = (1 — 2) 7 for 1 — 2t > 0. The 
representation in (3.4.1) holds in general. The mgf of y?, Àj y; and the sum of Aj; y; are 
the following: 


р 
My) = @ — 24у, M, = @ — 222), М„@) = [ [a 23507 


for 1 — Ау > 0, j =1,..., p. Hence, we have the following identity: 


p 
0-20 = [а -21;5 :1-2:9-0,0-2:£»0,j1..p. (42) 
j=l 


Taking natural logarithm on both sides of (3.4.2), expanding and then comparing the coef- 
2p? 


ficients of 2t, S, We have 
р р р 
2 3 
Э =) dex A eee (3.4.3) 


The only solution (3.4.3) can have is that r of the А ;’s are unities and the remaining ones 
zeros. This property alone will not guarantee that A is idempotent. However, having eigen- 
values that are equal to zero or one combined with the property that A — A' will ensure 
that A — A?. This completes the proof. 


Let us look into some generalizations of the Theorem 3.4.1. Let the p x 1 vector have 
a real Gaussian distribution X ^ N,(O, X), X > О, that is, X is a Gaussian vector 
with the null vector as its mean value and a real positive definite matrix as its covariance 
matrix. When 27 is positive definite, we can define xi. Letting Z — X -2X ‚ Z will 
be distributed as a standard Gaussian vector, that is, Z ~ Np(O, I), since Z is a linear 
function of X with E[Z] = O and Cov(Z) = I. Now, Theorem 3.4.1 is applicable to Z. 
Then u — X'AX, A — A', becomes 


c DANZ, DAN = (MANI, 


and it follows from Theorem 3.4.1 that the next result holds: 
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Theorem 3.4.2. Let the p x 1 vector X have a real p-variate Gaussian density Х ~ 
Ny(O, X), X > О. Then ds X'AX, A = A’, is a real chisquare with r degrees of 
freedom if and only if Z2 А 272 is idempotent and of rank r or, equivalently, if and only if 
A = AXA and the rank of A is г. 


Now, let us consider the general case. Let X ~ Np(u, X), Y > O.Letq = 
X'AX, A — A'. Then, referring to representation (2.2.1), we can express q as 


мш bj)? + +++ b Ap(Up d bp)” = Ашт t Арш? (3.4.4) 


where U = (ui,..., up) ^ N,(O, Г), the 4;’s, j = 1,..., р, are the eigenvalues of 
УЗА У? and bj is the i-th component of P'X и, Р being a p х р orthonormal matrix 
whose j-th column consists of the normalized eigenvectors corresponding to Aj, j = 
1,..., p. When u = О, ш? is areal central chisquare random variable having one degree 
of freedom; otherwise, it is a real noncentral chisquare random variable with one degree 
of freedom and noncentality parameter 15. Thus, in general, (3.4.4) is a linear function of 
independently distributed real noncentral chisquare random variables having one degree 
of freedom each. 


Example 3.4.1. Let X ~ N3(O, X),q = X'AX where 


X1 1 2 0 —1 1 
X=|x2|/,2=-]| 0 2 —1 |,A=]1 
X3 =|. —1 3 1 


(1) Show that q ~ x? by applying Theorem 3.4.2 as well as independently; (2) If the mean 
value vector u’ = [—1, 1, —2], what is then the distribution of q? 


Solution 3.4.1. In (1) и = О and 
Х'АХ = x? + ds + Ж + 2(x1xo + x1xa3 + хәхз) = (x1 + xo + x3)’. 
Let y, = x1 + x2 + x3. Then E[y,] = 0 and 


Уаг(уџ) = Var(x,) + Var(x2) + Var(x3) + 2[Cov (x1, хә) + Cov(xi, хз) + Cov(x2, x3)] 


1 3 
= -[2+2+3+0-2-2]=-=l1. 
3 3 
Hence, y; = x; + x2 + хз has E[uj] = 0 and Var(uj) = 1, and since it is a linear 
function of the real normal vector X, y; is a standard normal. Accordingly, g = у? сз 
x T In order to apply Theorem 3.4.2, consider A X A: 
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її її ,f2 0 ч 111 111 
AZA=|1 1 1] (=] 0 2 -1/)j1 1 1[8]1 1 if=a. 
11 Зр _1 3 111 111 


Then, by Theorem 3.42, = X'AX ~ ye where r is the rank of A. In this case, the rank 
of A is 1 and hence y ^ x т This completes the calculations in connection with respect 
to (1). When и 4 О, и ~ x20), a noncentral chisquare with noncentrality parameter 
A= iu 5—1 р. Let us compute X7! by making use of the formula X7! = mil CofCE)r 
where Cof( 2) is the matrix of cofactors of X wherein each of its elements is replaced by 
its cofactor. Now, 


[2 0 -1 5 12 
-1 i 3 -1 
mafo EEE 
3j ът 3 81224 
Then, 
TRE 5 121[-1 3 
22 4||-2 6 


This completes the computations for the second part. 


3.4.1. Independence of quadratic forms 


Another relevant result in the real case pertains to the independence of quadratic forms. 
The concept of chisquaredness and the independence of quadratic forms are prominently 
encountered in the theoretical underpinnings of statistical techniques such as the Anal- 
ysis of Variance, Regression and Model Building when it is assumed that the errors are 
normally distributed. First, we state a result on the independence of quadratic forms in 
Gaussian vectors whose components are independently distributed. 


Theorem 3.4.3. Letu; = X'AX, А = A’, and u = X'BX, В = B’, be two quadratic 
forms in X ~ Np(p, I). Then uj and uz are independently distributed if and only if 
AB = О. 

Note that independence property holds whether u = О or и zz О. The result will still 


be valid if the covariance matrix is o?/ where o? is a positive real scalar quantity. If the 
covariance matrix is X > О, the statement of Theorem 3.4.3 needs modification. 
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Proof: Since AB = О, we have AB = О = O' = (AB) = B'A' = BA, which means 
that A and B commute. Then there exists an orthonormal matrix P, PP’ = I, Р'Р = I, 
that diagonalizes both A and B, and 


AB = О = PABP = О = Р'АРР'ВР = DiD: = О, 
Dı = diag(Ai,..., Àp), D» = diag(vi,..., Vp); (3.4.5) 


where А, ..., Ар are the eigenvalues of A and vj,..., v, are the eigenvalues of B. Let 
Y = Р'Х, then the canonical representations of иј and из are the following: 


и = Ary} ++ Ару» (3.4.6) 
ил = эу te + урур (3.4.7) 


where y;'s are real and independently distributed. But Dı D2 = О means that whenever 
ал; # О then the corresponding v; = 0 and vice versa. In other words, whenever a y; 
is present in (3.4.6), it is absent in (3.4.7) and vice versa, or the independent variables 
yj S are separated in (3.4.6) and (3.4.7), which implies that и and из are independently 
distributed. 


The necessity part of the proof which consists in showing that AB = O given that 
A = A’, B = B' and u; and из are independently distributed, cannot be established by 
retracing the steps utilized for proving the sufficiency as it requires more matrix manipu- 
lations. We note that there are several incorrect or incomplete proofs of Theorem 3.4.3 in 
the statistical literature. A correct proof for the central case is given in Mathai and Provost 
(1992). 

If X ^ №(и, X), X > О, consider the transformation Y = E-2X ~ 
N (573и, I). Then, uy = X'AX = Y'D2AD2Y, u = Х'ВХ = Ү'У?ВУ?Ү, and 
we can apply Theorem 3.4.3. In that case, the matrices being orthogonal means 


EiAXiXiBX? =O > АХВ = О. 
Thus we have the following result: 
Theorem 3.4.4. Let uj = X'AX, A = А! апа uy = X'BX, В = PB! where 


X ~ Ny(u, X), X > О. Then uy and uz are independently distributed if and only if 
AXB = О. 
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What about the distribution of the quadratic form y = (X — uy X =I(X — u) that is 
present in the exponent of the p-variate real Gaussian density? Let us first determine the 
mef of y, that is, 


MG) = Ele] = — / с Ion Е х 
Qnx)?|X|? Јх 
DE. а ae eas 
p 1 
(2л) |512 /x 
= (1 — 27? for (1 — 2t) > 0. (3.4.8) 


This is the mgf of a real chisquare random variable having p degrees of freedom. Hence 
we have the following result: 


Theorem 3.4.5. When X ~ N,(u, X), X > О, 
y -(X-uyEX (X – u) ~ х2, (3.4.9) 


and if yy = X'XE-!X, then у ~ xp A), that is, a real non-central chisquare with p 
degrees of freedom and noncentrality parameter = j ie aaa 


Example 3.4.2. Let X ~ Мз(и, X) and consider the quadratic forms иј = X'AX and 
u2 = X'BX where 


ХІ 2 0 —1 1 
X-—|x|, = – 0 2-1 |, А= [1 | 
Хз —] -1 3 1 

2 -1 —1 1 
Bex zd. 3 af |, pas 
—] -1 2 3 


Show that иј and u2 are independently distributed. 


Solution 3.4.2. Let J be a 3 x 1 column vector of unities or 1’s as its elements. 
Then observe that A = JJ’ and B = І — JJ. Further, J/J = 3, JE = Ј and 
hence AY = JJ'S = JJ'. Then AXB = ЈЈ И — 1JJ'] = JJ! -JJ = O. 
It then follows from Theorem 3.4.4 that и and и» are independently distributed. Now, 
let us prove the result independently without resorting to Theorem 3.4.4. Note that 
из = ху + x2 + хз = J'X has a standard normal distribution as shown in Exam- 
ple 3.4.1. Consider the B in BX, namely J — iJ J'. The first component of B X is of the 
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form 4[2, —1, —1]X = 4[2x1 — хо — хз], which shall be denoted by u4. Then из and 


u4 are linear functions of the same real normal vector X, and hence из and ид are 
real normal variables. Let us compute the covariance between из and ил, observing that 
J'E = J' = J'Cov(X): 


1 2 1 2 
Сомиз, ша) = = 1,1, ПСоу(х) | -1 |= [1,1,1] —1 | =0. 
=j -1 


Thus, u3 and u4 are independently distributed. As a similar result can be established with 
respect to the second and third component of B X, из and BX are indeed independently 
distributed. This implies that и2 = (J'X)? = X'JJ'X = X'AX and (BX)'(BX) = 
X'B'BX — X'BX are independently distributed. Observe that since B is symmetric and 
idempotent, В'В = B. This solution makes use of the following property: if Yı and Y? 
аге real vectors or matrices that are independently distributed, then У 1 Y; and Y; Yə are also 
independently distributed. It should be noted that the converse does not necessarily hold. 


3.4a. Chisquaredness and Independence in the Complex Gaussian Case 


Let the p x 1 vector X in the complex domain have a p-variate complex Gaussian 
density Х ~ N,(O, I). Let ù = Х* АХ be a Hermitian form, A = A* where A* denotes 
the conjugate transpose of A. Then there exists a unitary matrix Q, QQ* = I, 0*0 = 1, 
such that Q*A Q = diag(A1, ..., Ар) where A1, ..., Ар are the eigenvalues of A. It can be 
shown that when A is Hermitian, which means in the real case that A — A' (symmetric), 
all the eigenvalues of A are real. Let Y — Q*X then 


й = X*AX = Y' Q'AQY = ji? +А [2 (3.4a.1) 


where |y;| denotes the absolute value or modulus of уу. If y; = уу + iyj2 where ууу and 
yj2 are real, i = ./(—1), then |у; |2 = y + Yio: We can obtain the following result which 
is the counterpart of Theorem 3.4.1: — 


Theorem 3.4a.l. Let X ~ N,(O,1) апай = X*AX, A = A*. Then й ~ ўа 
chisquare random variable having r degrees of freedom in the complex domain, if and 
only if A = A? (idempotent) and A is of rank r. 


Proof: The definition of an idempotent matrix A as A = A” holds whether the elements 
of A are real or complex. Let A be idempotent and of rank r. Then r of the eigenvalues of 
A are unities and the remaining ones are zeros. Then the representation given in (3.4a.1) 
becomes 

й=|ў ++] ~ KP 
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a chisquare with r degrees of freedom in the complex domain, that is, a real gamma with 
the parameters (y. = r, = 1) whose mgf is (1 — 1) ",1 — t > 0. For proving the 
necessity, let us assume that й ~ x7, its mgf being М; (t) = (1 — t)" for 1 — t > 0. But 
from (3.4a.1), I5; P? ~ x and its mgf is (1 —t)~! for 1 — t > 0. Hence the mgf of A 19,17 
is M, sp) =(1- Agi) for 1 — A;t > 0, and we have the following identity: 


– 1)" = Пи – At) l. (3.4a.2) 


Take a natural logarithm on both sides of (3.4a.2, expand and compare the coefficients 


oft . to obtain 


, 5 k 
P P 

Эу (3.44.3) 
j=l j=l 


The only possibility for the A ;’s in (3.4a.3) is that r of them are unities and the remaining 
ones, zeros. This property, combined with A — A* guarantees that A — A? and A is of 
rank r. This completes the proof. 


An extension of Theorem 3.4a.1 which is the counterpart of Theorem 3.4.2 can also be 
obtained. We will simply state it as the proof is parallel to that provided in the real case. 


Theorem 3.4a.2. Let X ~ №,(0, ХУ), X > О апай = X*AX, A = А*, bea 
Hermitian form. Then й ~ x2 a chisquare random variable having r degrees of freedom 


in the complex domain, if and only if A = AX A and A is of rank г. 


Example 3.4a.1. Let Х ~ № (д, X), ü = X*AX where 


. [я i 3 “Peg etl) 
X= | e des 3 -L0]; 
3 —(1+i) —-0-1i) 3 

111 2+i 
А=|1 1 ре =i 
Ld 2i 


First determine whether 27 can be a covariance matrix. Then determine the distribution 
of п by making use of Theorem 3.4a.2 as well as independently, that is, without using 
Theorem 3.4a.2, for the cases (1) д = О; (2) A as given above. 
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Solution 3.4a.1. Note that X = X*, that is, X is Hermitian. Let us verify that X is a 
Hermitian positive definite matrix. Note that 27 must be either positive definite or positive 
semi-definite to be a covariance matrix. In the semi-definite case, the density of X does not 
exist. Let us check the leading minors: det((3)) = 3 > 0, det( Е "ud i) йч 2 ) = 
9—2 = 7 > 0, det(2) = 5 > 0 [evaluated by using the cofactor expansion which is 
the same in the complex isc. Hence 27 is Hermitian positive definite. In order to apply 
Theorem 3.4a.2, we must now verify that A X A = A when д = О. Observe the following: 
A =JJ', J'A =3J', J'J = 3 where J’ = [1, 1, 1]. Hence AXA = (JJNX(J J’) = 
JQ'Z)JJ! 2 1(7Ј)(ЈЈ) = 4J0'J)J! = 4J(3)J' = JJ' = A. Thus the condition 
holds and by Theorem 3.4a.2, п ~ x: in the complex domain, that is, й a real gamma 
random variable with parameters (о = 1, 8 = 1) when д = О. Now, let us derive this 
result without using Theorem 3.4a.2. Let йү = x; + X2 + x3 and А1 = = (1,1, D Note that 
Aj X= ui, the sum of the components of X. Hence ii ии = = X* ААХ = = X*AX. For 
й = О, we have E[u,] = 0 and 


Var(ŭ1) = Var(x1) + Var(x2) + Var(x3) + [Cov(x1, хә) + Cov(x2, x1)] 
+ [Cov(x1, Хз) + Cov(Xs, x1)] + [Соу(х2, X3) + Cov(Xs, x2)] 


=3+3+3+1-@+0-@-01+41-@-0- 0] 


1 
+U +D- (A-D) 319-6] 1. 


Thus, иу is a standard normal random variable in the complex domain and ити ~ Ят. а 
chisquare random variable with one degree of freedom in the complex domain, that is, a 
real gamma random variable with parameters (o = 1, В = 1). 

For д = (2 +i, —i, 2i), this chisquare random variable is noncentral with noncen- 
trality parameter А = j1*X-!f. Hence, the inverse of X has to be evaluated. To do so, 
we will employ the formula 2; „к= 7] [Cof ()]’, which also holds for the complex case. 


Earlier, the determinant was found to be equal to 5 and 


gl 7. 3er зи 


+ cots] = 34+i 7 3-1 |; then 
I 38/35 i 34i 7 
БЕ 7 3+i 3—1 
g! = -l ico) 3-i 7 3+i 
ixi, of( Z)] = 13 і +i 


Set 3—1 7 
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and 
43 T Sui 347) (24: 
A= У й = l-i i, —2i] 3-1 7 3+ =] 
347 3—1 7 2i 


(76)(33) 2052 
= = —— m 157.85. 
13 13 s 


This completes the computations. 


3.4a.1. Independence of Hermitian forms 


We shall mainly state certain results in connection with Hermitian forms in this section 
since they parallel those pertaining to the real case. 


Theorem 3.4a.3. Let ü, = X*AX, A = A*, and iy = X*BX, B = B*, where 
X ^ М(и, I). Then, uy and йә are independently distributed if and only if AB = О. 


Proof: Let us assume that AB — O. Then 
АВ = О = О* = (АВ)* = В*А* = BA. (3.4a.4) 


This means that there exists a unitary matrix Q, QQ* = I, Q*Q = I, that will 
diagonalize both A and B. That is, Q*AQ = diag(Aj,...,A,) = Di, Q*BQ = 
diag(vi,...,vp) = D» where Aj,...,Ap are the eigenvalues of A and vj, ..., vy are 
the eigenvalues of B. But AB = О implies that Ру Р» = О. As well, 


ii) = X* AX = Y*Q*AQY = Xil? +--+ Aplpl?, (3.4a.5) 
ity = X*BX = Y* Q' BQY = vili? +++: +019. (3.4a.6) 


Since Dı D2 = О, whenever a à; Æ 0, the corresponding v; = 0 and vice versa. Thus the 
independent variables y;'s are separated in (3.4a.5) and (3.4a.6) and accordingly, йу and 
uz are independently distributed. The proof of the necessity which requires more matrix 
algebra, will not be provided herein. The general result can be stated as follows: 


Theorem 3.4a.4. Letting Х ~ Nat У), X > О, the Hermitian forms йу = X*AX, 
A = A*, and пә = X*BX, B = PB", are independently distributed if and only if 
AXB = О. 


Now, consider the density of the exponent in the p-variate complex Gaussian density. 
What will then be the density of y = (X — M“ X7 5 (X — д)? Let us evaluate the mgf of 
y. Observing that y is real so that we may take E[e’”] where t is a real parameter, we have 
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Efe’) = 1 / QC xz Й) ЙЕ pg 
z P|det(2)| Jx 


M3(t) 


EM eee / А 
z|det(2)| Jx 
= (1— 1)? for1— t > 0. (3.4a.7) 


-a-o0-g*z -DJX 


This is the mgf of a real gamma random variable with the parameters (о = р, В = 1) ога 
chisquare random variable in the complex domain with p degrees of freedom. Hence we 
have the following result: 


Theorem 3.4a.5. When X ~ N,(ji, X), Z > О then y = (X — i) Z-(X — ji) is 
distributed as а real gamma random variable with the parameters (a = p, В = 1) ora 


chisquare random variable in the complex domain with p degrees of freedom, that is, 


ӯ ~ gamma(a = р, В = 1) огу ~ A (3.4a.8) 


Example 3.4a.2. Let X ~ N3(ji, X), й = X* AX, iio = X*BX where 


xi 22i 3 a4), (0-1) 
X2lis|lghp|3432:|, X=-|-0-i) 3 —(1+i)]|, 
X3 ]1—i —(1 +i) —(1-— 1i) 3 

111 > -1 =i 
A=]111],B=-|-1 2 -1 
111 ep =i: 2 


(1) By making use of Theorem 3.4a.4, show that й and i? are independently distributed. 
(2) Show the independence of п and пә without using Theorem 3.4a.4. 


Solution 3.4a.2. In order to use Theorem 3.4a.4, we have to show that AX В = O ir- 
respective of д. Note that A = JJ’, J’ = [1,1,1], JJ = 3, JE = iJ, JUH = О. 
Hence AX = JJ'S = Ј(Ј У) = 177 > AXB = jJJ'B = }J(J'B) = 
О. This proves the result that 4; and из are independently distributed through The- 
orem 3.4a.4. This will now be established without resorting to Theorem 3.4a.4. Let 
йз = у + 0 + X4 = J'X and iy = iD — &5 — Хз] or the first row of ВХ. 
Since independence is not affected by the relocation of the variables, we may assume, 
without any loss of generality, that д = О when considering the independence of йз and 
пд. Let us compute the covariance between йз and йл: 
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1 2 1 2 
Соу(йз, йа) = 2L. 1,15 | -1 | = = | =o. 
= exi 


Thus, 43 and 44 are uncorrelated and hence independently distributed since both are 
linear functions of the normal vector X. This property holds for each row of BX and 
therefore йз and BX are independently distributed. However, й = X*AX = TATE: 
and hence й and (BX)*(BX) = X*B*BX = X*BX = й» are independently dis- 
tributed. This completes the computations. The following property was utilized: Let Ü 
and V be vectors or matrices that are independently distributed. Then, all the pairs 
(О, V*), (0, УУ"), (О, V*V),..., (UU*, VV*), are independently distributed when- 
ever the quantities are defined. The converses need not hold when quadratic terms are 
involved; for instance, (UU*, V V*) being independently distributed need not imply that 
(U, V) are independently distributed. 


Exercises 3.4 


3.4.1. In the real case on the right side of (3.4.4), compute the densities of the following 
items: (i) 22, (ii) A1z], (iii) Aiz? + Ао25, (iv) Aiz? +--+ Aazidf Ay = Ао, Аз = Ag for 
н = О. 


3.4.2. Compute the density of и = X'AX, A = A’ in the real case when (i) X ~ 
N,(O, X), X > O, (i) X ~ Np(u, E), X > О. 


3.4.3. Modify the statement in Theorem 3.4.1 if (i) X ~ N,(O,071), o? > 0, (ii) 
X ^ Ny(n, o? I), и О. 


3.4.4. Prove the only if part in Theorem 3.4.3 


3.4.5. Establish the cases (1), (ii), (iii) of Exercise 3.4.1 in the corresponding complex 
domain. 


3.4.6. Supply the proof for the only if part in Theorem 3.4a.3. 


3.4.7. Can a matrix A having at least one complex element be Hermitian and idempotent 
at the same time? Prove your statement. 


3.4.8. Let the p x 1 vector X have a real Gaussian density N,(O, X), X > О. Let 
u — X'AX, A — A'. Evaluate the density of u for p — 2 and show that this density can 
be written in terms of a hypergeometric series of the | F type. 


3.4.9. Repeat Exercise 3.4.8 if X is in the complex domain, Х ~ М0, У), X > О. 
3.4.10. Supply the proofs for ће only if part in Theorems 3.4.4 and 3.4а.4. 


178 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


3.5. Samples from a p-variate Real Gaussian Population 


Let the p x 1 real vectors X,,..., X, be iidas №, (и, 27), X > О. Then, the collection 
X1, ..., Xn is called a simple random sample of size n from this Np(u, X), X > О. Then 
the joint density of X1,..., X, is the following: 


n eTa (Xj E (CGj-u) 


г= ||) = |] 


Beal 
j=l j=l Qn)?|X|? 
= [Qx)* |z|*]-1e 3 Lint XW E AX j=, (3.5.1) 
This L at an observed set of X1,..., X, is called the likelihood function. Let the sample 


matrix, which is p x n, be denoted by a bold-faced X. In order to avoid too many symbols, 
we will use X to denote the p x n matrix in this section. In earlier sections, we had used 
X to denote a p x 1 vector. Then 


X11. X12 ... Xin X1k 


X21 X22 ... Xn X2k 
X —[X1,..., Xn] = . . 2 è , X 


: : Ae x кт БТ Ё 
| xpi Xp2 «+: "4 E 


Let the sample average be denoted by X — I(x pc + Xn). Then X will be of the 
following form: 


= 1,...,п. Ò 


Xi n 

2 7 1 

X= os le а у хі = average on the i-th component of any Xj. (ii) 
хь fa 


Let the bold-faced X be defined as follows: 


Х1 XI XI 
xc xx 8. И secto 
Xp Xp Xp 
Then, 
MSA px ase xim — 
on RS mom ur me. | 
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and м А 

5 = (Х – Х)(Х – X) = (sij), 
so that n 

Sij = У Gr OQ Ex. 

k=1 

S is called the sample sum of products matrix or the corrected sample sum of products 
matrix, corrected in the sense that the averages are deducted from the observations. As 
well, Lsi is called the sample variance on the component x; of any vector X;, referring to 
(i) above, and Is; j> | # j, is called the sample covariance on the components x; and x; 
of any Xx, 15 being referred to as the sample covariance matrix. The exponent in L сап 
be simplified by making use of the following properties: (1) When u is a 1 x 1 matrix or 
a scalar quantity, then (и) = (и) = и = и’. (2) For two matrices A and B, whenever 
AB and BA are defined, tr(AB) = tr(BA) where AB need not be equal to B A. Observe 
that the following quantity is real scalar and hence, it is equal to its trace: 


Diy -uYz-'ax; - Озар] 


=! 2208 — iX; — wy 
j=l 


= x;-X-X-4Q;-X-X =) | 
22] 

= У(Х — X)(X — X] + nr[ X^ (X — W(X — uy] 

= (5715) n(X — uy (X — и) 


because 
ПОХ = W(X — W’) = w(X — py EK - i) = (X wy EX - и). 
The right-hand side expression being 1 x 1, it is equal to its trace, and L can be written as 


Lol ЖЕҢ з с ыо =ч (3.5.2) 
(Ол)? ||? 


If we wish to estimate the parameters u and X from a set of observation vectors cor- 
responding to X1,..., Xn, one method consists in maximizing L with respect to u and 
X given those observations and estimating the parameters. By resorting to calculus, L is 
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differentiated partially with respect to u and X, the resulting expressions are equated to 
null vectors and matrices, respectively, and these equations are then solved to obtain the 
solutions for u and X. Those estimates will be called the maximum likelihood estimates 
or MLE’s. We will explore this aspect later. 


Example 3.5.1. Let the 3 х 1 vector X, be real Gaussian, X, ~ N3(u, 27), X > O.Let 
Х p J =1,2,3,4 be tid as X1. Compute the 3 x 4 sample matrix X, the sample average 


X, the matrix of sample means X, the sample sum of products matrix 5, the maximum 
likelihood estimates of u and 27, based on the following set of observations on Ху, j = 


,2,3,4: 


2 1 0 
А | ee EO у Ж == di 
—1 j 4 


Solution 3.5.1. The 3 x 4 sample matrix and the sample average are 


2c qe OM 2+1+1+0 | 
Х=| 0 -1 0 1 |,xX= 7] 0-1+0+1 |= 
-1 2 4 3 —1+2+4+3 2 


Then Х and Х — Х are the following: 


E BED 1 1 1 1 _ 1 0 0 —1 
X-—[X,X,X,X]2|000 0|, X-X— 0 -1 0 1 . 
2: 2-202 —3 2 1 
and the sample sum of products matrix S is the following: 
ИМ 100—1 os 2-1 -4 
S = [X—X][K—X] = 0 —1 0 0 0 go = —1 2 1 
—3 0 2 1 m 1 1 —4 1 14 


Then, the maximum likelihood estimates of u and X, denoted with а hat, аге 


| 1 i i 2 —1 —4 
й=Х=|0 uc sc ex. 1 
2 е -4 1 14 


This completes the computations. 
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3.5a. Simple Random Sample from a p-variate Complex Gaussian Population 
Our population density is given by the following: 
е—(Х)-Ю*Ў-!(Ху—-й) 


(Xj) SX; NILUS) ЭУ = 0. 
ID SEO j ~ #0, 5) 

Let X DE x be a collection of complex vector random variables iid as X PEN 
Ñ pli, X), X > О. This collection is called a simple random sample of size n from 
this complex Gaussian population f(X ;). We will use notations parallel to those utilized 
in the real case. Let X = [X,..., Xn], X = 1(X, ++ Xna), X = (Х,..., X), and 


Š = (X —X)(K — X)* = 5 = (Sij). Then 


n 
8j = У Gu — X) — Ху)* 
k=1 
with Is; j being the sample covariance between the components х; and ху, i 5 j, of any 
Хк, k=1,...,n, Hy being the sample variance on the component x;. The joint density 
of X1, tiers ae denoted by L, is given by 


е-0-и)* у-и) е Eja jw E G7») 


pai = i (3.5a.1) 
j=l 


mP|det(>)| z"P |det( )|" 


which can be simplified to the following expression by making use of steps parallel to 
those utilized in the real case: 
e E715) —n Xp)" 1-и) 


L= = (3.5a.2) 
л" |det(X)|" 


Example 3.5a.1. Let the 3x 1 vector X; in the complex domain have a complex trivariate 


Gaussian distribution X, ~ N3(&, X), Š > O. Let Xj, j = 1,2,3,4 be iid as Х|. 
With our usual notations, compute the 3 x 4 sample matrix X, the sample average X, 


the 3 x 4 matrix of sample averages X, the sample sum of products matrix S and the 
maximum likelihood estimates of д and 27 based on the following set of observations on 


Xj, j 21,2,3,4: 
1+1 -1-42i —2 + 2i —2 +31 


X12|2-i|, X% = 3i ‚ Хз = 3 +i , Ха = 34i 
1—1 -l +i 4+ 21 —4 + 2i 
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Solution 3.5a.1. The sample matrix X and the sample average Х аге 


_ [14i -14+2i 2+2 2+3] _ [-1-42i 
X-|2-i 3i з+ 34i |,Х=| 2+: 
1-1 -l+i 4+ —4+2i i 


Then, with our usual notations, X and X — X are the following: 


—1+2 —1+2 —1+2ї —1+2ї 


xc йыл conie. шм» е А 
; i i i 
E c4 0 TES C 
е am cup 3 1 
bsop- eq AEL R4 


Thus, the sample sum of products matrix Sis 


$=[Х—ХХ—Х 


Ы 
=| —2i —2+2i 1 1 . 
jc» amb. us csi] | "d 
рее р e 
8 5ї 5-i 
=| —5i 14 6—6 
5—i 646i 40 
The maximum likelihood estimates are as follows: 
с —] 4- 2i А T 
pa Х = 2+i > 15 


where S is given above. This completes the computations. 
3.5.1. Some simplifications of the sample matrix in the real Gaussian case 
The p x n sample matrix is 
X11 X12 ... Xin Хк 
X21 X22 ... Хп X2k 


X = [X1,..., Xn] = : ы n. : ‚ Хұ = 
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where the rows are iid variables on the components of the p-vector X1. For example, 


(x11, X12; ---, X15) are iid variables distributed as the first component of X1. Let 
x п 
= + ke ХЕ 
= X2 

: 1 n 
Xy n 2&1 Xpk 

кл 1 H 

= : = —-XJ,J=]:|],nx 1. 
n 
[Жс T 1 


Consider the matrix 
Е = 2 1 1 
X = (Х,...,Х) 2 -ХЈЈ = ХВ, В = -]ЈЈ. 
п п 
Then, 
_ 1 
Х-Х= ХА, А=/-В=1—-—//. 
п 


Observe that А = А2, В = B?, АВ = О, А = А! and В = B’ where both A and 
В are n x n matrices. Then XA and XB аге p x n and, in order to determine the mgf, 
we will take the p x n parameter matrices Ту and 7. Accordingly, ће mgf of XA is 
MxA(I)) = Е[е"(ХА)], that of XB is Мҳв(7) = Е[е"(2Х8)] and the joint mgf is 
Efe (ХА) +(7,X8)]_ Let us evaluate the joint mgf for X; ~ №,(0, 1), 


Efe XA) +0 XB) = / 1 gr TIXAHI (IX B) - 38 (XX) gy 
np n xi 
X (21) ? |22 


Let us simplify the exponent, 
1 
— 5 (XX) — 2tr[X(AT] + BT;)]}. (i) 
If we expand tr[(X — C)(X — C)’] for some C, we have 


tr(XX’) — tr(CX’) — tr(XC’) + tr(CC’) 
= tr(XX’) — 2tr(XC’) + tr(CC’) (il) 
as tr(XC’) = tr(CX’) even though CX’ +Æ XC’. On comparing (i) and (ii), we have 
C' = ATi + ВТ), and then 


tr(CC^) = tr[ (Ti A’ + Tj B'(AT| + ВТ;)] 
= tr(Tı A AT[) + (7 В'ВТ;) + tr(T, ABTS) -t(T5B'AT]). — (iii) 
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Since the integral over X — C will absorb the normalizing constant and give 1, the joint 
mgf is e" CC), Proceeding exactly the same way, it is seen that the mgf of XA and XB are 
respectively 


МхА(Т) = e2u (D AAT] and Mxp(T™) = е?т(В'ВТ;). 


The independence of XA and XB implies that the joint mgf should be equal to the product 
of the individual mgf's. In this instance, this is the case as A'B = О, B'A = О. Hence, 
the following result: 


Theorem 3.5.1. Assuming that X1,..., X, are iid as X; ~ №(0, 1), let the p x n 
matrix X = (X4,..., X,) and X = 1XJ, Ј' = (1,1, .., 1). LetX = XB andX-X = XA 
so that A= A', B = B', A? = A, B? = B, AB = О. Letting О = XB and U = XA, 
it follows that О and U» are independently distributed. 


Now, appealing to a general result to the effect that if U and V are independently 
distributed then U and V V' as well as U and V'V are independently distributed whenever 
V V' and V'V are defined, the next result follows. 


Theorem 3.5.2. For the p x n matrix X, let XA and XB be as defined in Theorem 3.5. 1. 
Then XB and XAA'X' — XAX' — S are independently distributed and, consequently, the 
sample mean X and the sample sum of products matrix S are independently distributed. 


As и is absent from the previous derivations, the results hold for a №, (и, I) pop- 
ulation. If the population 15 Np(u, X), X > О, it suffices to make the transforma- 
tion Y; — X-X; or Y = XE-3X, in which case X — ry. Then, tr(7/XA) = 
tr(T{ X ҮА) = u[(T; X 3)YA] so that E? is combined with Тү, which does not affect 
Y A. Thus, we have the general result that is stated next. 


Theorem 3.5.3. Letting the population be N,(u, X), X > О, and X, A, B, S, and 
X be as defined in Theorem 3.5.1, it then follows that Оу = XA and U2 = XB are 
independently distributed and thereby, that the sample mean X and the sample sum of 
products matrix S are independently distributed. 


3.5.2. Linear functions of the sample vectors 


Let the X;'s, j = l,...,n, be iid as X; ~ Np(u, X), X > О. Let us consider a 
linear function ау X +a2X2 --- -- Һа, X, where а], ..., аһ are real scalar constants. Then 
the mgf's of Xj, aj Xj, U = Sem aj Х j are obtained as follows: 
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ly. / ly ES. I 2T' 
Mx,(T) = Е[е ыан ка M (T) e! "0+0 ЕТ 
n 
$ n , l n 2 / 
My» iajXj (Т) = I] Ma;x; (T) = ae MO a1 +5 si GT ET. 
j=l 


which implies that U = a1 aj Xj is distributed as a real normal vector 
random variable with parameters (a1 a) and (1 aZ, that is, U ~ 
Np (0071 aj), PON а?) X). Thus, the following result: 


Theorem 3.5.4. Let the X;'s be iid Ny(u, X), X > О, j = 1,...,n, and = 
a1 X1 o: dT aj X, be a linear function of the Xj's, j = 1,...,n, where aj,..., an are 
real scalar constants. Then U is distributed as a p-variate real Gaussian vector random 
variable with parameters [jar aE, Q 7482], that is, U ~ N (jar aj)n. 
(275.442) X), E > О. 


п 
п PED n 132 | 1 — 1 HN = _ 
2 j-14 E eG) = у. However, when aj = ПО С Ер 


Ј п? 


i(x 1 +---+X,). Hence we have the following corollary. 


If, in Theorem 3.5.4, aj = Lj = l,...,n, then Уа) = Di: = | and 


Corollary 3.5.1. Let the X;'s Бе Np(u, X), X > О, j = 1,...,n. Then, the sam- 
ple mean X = 104 + ----+ Xn) is distributed as a p-variate real Gaussian with the 
parameters u and 1y, that is, X ~ Np, 15), X > О. 


From the representation given in Sect. 3.5.1, let X be the sample matrix, X = I(x 1+ 

--- + Xn), ће sample average, and ће p х n matrix X= (Х, T X), X-X- X(1 — 

ЫЛ!) = XA, J’ = (1,..., 1). Since A is idempotent of rank n — 1, there exists an 

orthonormal matrix P, PP’ = I, Р'Р = I, such that P'AP = diag(1,...,1,0) = 

D, A= PDP' and XA = XPDP' = ZDP'. Note that A = A’, A? = A and D? = D. 
Thus, the sample sum of products matrix has the following representations: 

S = (Х — Х)(Х - Х) = ХААХ = ХАХ' = ZDD'Z = Z, Z! (3.5.3) 


п—1 


where 7„—11$а p x (n — 1) matrix consisting of the first n — 1 columns of Z = XP. When 
D = | Е o | у Z = (i-i, Z(n)). ZDZ — ЛЖ, Ж 


where Zn) denotes the last column of Z. For a p-variate real normal population wherein 
the X;'s are iid №, (и, 27), X > О, j =1,...,n, Xj — X = (Xj — ш) — (X — ш) and 
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hence the population can be taken to be distributed as N,(O, X), X > О without any 
loss of generality. Then the n — 1 columns of Z,,_; will be iid standard normal №,(О, Г). 
After discussing the real matrix-variate gamma distribution in the next chapter, we will 
show that whenever (n — 1) > p, 2.12, has a real matrix-variate gamma distribution, 
or equivalently, that it is Wishart distributed with n — 1 degrees of freedom. 


3.5a.1. Some simplifications of the sample matrix in the complex Gaussian case 


Let the p x 1 vector X, in the complex domain have a complex Gaussian den- 


Д N pb E) X > O. Let X,..., X, be iid as Ху  N,(ü, У), E > О or 
= [X ifs Зек» x] is the sample matrix of a simple random sample of size n from 


a N ae X), X > О. Let the sample mean vector or the sample average be X = 


i(X poet X п) and the matrix of sample means be the bold-faced p x n matrix X. 
Let $ = (Š - 5X - X)*. Then X = 1ХЈЈ = XB, X- Х=Х(1— ЫЛ) = 
Then, А = А?, А = A’ = A*, B= Bi = B*, В = В?, АВ = О, ВА = О. "un 
results parallel to Theorems 3.5.1 and 3.5.2 hold in the complex domain, and we now state 
the general result. 


Theorem 3.5a.1. Let the population be complex p-variate Gaussian N pui X), X > 
O. Let the p x n sample matrix be X = (X,,..., X4) where X4,..., X, are iid as 
(й, У), X > O. Let X, X, $, ХА, XB be as defined above. Then, XA and XB are 
independently distributed, and thereby the sample mean X and the sample sum of products 
matrix S are independently distributed. 


3.5a.2. Linear functions of the sample vectors in the complex domain 


Let Xj ~ Np (2, У), 5 = Ў* > Obea p-variate complex Gaussian vector random 
variable. Cotisider a simple random sample of size n from this population, in which case 
the Xr SJ = 1, ‚п, are iid as N (ft, X ), > O. Let the linear function U = 
aX ptc ia X, lies а, ...,ар are real or e scalar constants. Then, following 
ШЕЕ steps parallel to those provided in Sect. 3.5.2, we obtain the following mgf: 


M CF) E eg ROT aj) 3 Q jaja T* ET 
where $2 aj; = |412 +- + |2,02. For example, if a; = I, j= 1l,...,n, then 


ха aj = land ы а јај = L, Hence, we have the following result and the resulting 
corollary. 
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Theorem 3.5a.2. Let the p x 1 complex vector have a p-variate complex Gaussian 
distribution (й, x), X = X* > О. Consider a simple random sample of size п 
from this population, with the Xj ’s, j = 1,...,n, being iid as this p-variate complex 
Gaussian. Let a1, ..., an be scalar constants, real or complex. Consider the linear func- 
tion U = а1Х + -:- + а,Х,. Then О ~ Np aj), Q7 aja) Х), that is, 
U has a p-variate complex Gaussian distribution with the parameters IN ай and 
PR аја?) X. 


Corollary 3.5a.1. Let the population and sample be as defined in Theorem 3.5a.2. Then 
the sample mean X = Ho: po + X,) is distributed as a p-variate complex Gaussian 
with the parameters [4 and 15 Я 


Proceeding as in (ће real case, we can show that the sample sum of products matrix Š 
can have a representation of the form 


SZ, aZ, (3.5a.3) 


where the columns of 23 are iid standard normal vectors in the complex domain if the 
population is a p-variate Gaussian in the complex domain. In this case, it will be shown 
later, that S is distributed as a complex Wishart matrix with (n — 1) > p degrees of 
freedom. 


3.5.3. Maximum likelihood estimators of the p-variate real Gaussian distribution 


Letting L denote the joint density of the sample values X4,..., Xn, which are p x 1 
iid Gaussian vectors constituting a simple random sample of size n, we have 
e 305-0! EX jw) g- dtu 71 )- 3n - uy E! OC -.) 
L= Se ee (3.5.4) 
ja Grise Ол)? xit 
where, as previously denoted, X is the p x n matrix 


Е 1 = - 
Accu. Xa X = т), Х=(Х,...,Х), 
S = (X - X - X = (sij), sij = У (хи — X)xgk — xj). 
k=1 


In this case, the parameters are the p x 1 vector и and the p x p real positive definite 
matrix 27. If we resort to Calculus to maximize L, then we would like to differentiate L, or 


188 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


the one-to-one function In L, with respect to u and X directly, rather than differentiating 
with respect to each element comprising u апа X. For achieving this, we need to further 
develop the differential operators introduced in Chap. 1. 


Definition 3.5.1. Derivative with respect to a matrix. Let Y = (у) be a p x q matrix 
where the elements y;;’s are distinct real scalar variables. The operator will be defined 


as 5I = (35) and this operator applied to a real scalar quantity f will be defined as 
д of 
A) 
oY ду; j 


For example, if f = yn + y2 F УЗ = y11y12 + у21 + 355 + yo3 and the 2 x 3 matrix Y is 
af 8f 8f 
Y= B y12 | Y af = E ayy y 
y21 yo y» oY mer pa Da 


of _|2yu – уо 2y2-yu 2уз 
oY 1 2y22 1 ` 


There are numerous examples of real-valued scalar functions of matrix argument. The 
determinant and the trace are two scalar functions of a square matrix A. The derivative 
with respect to a vector has already been defined in Chap. 1. The loglikelihood function 
In L which is available from (3.5.4) has to be differentiated with respect to u and with 
respect to 27 and the resulting expressions have to be respectively equated to a null vector 
and a null matrix. These equations are then solved to obtain the critical points where 
the L as well as In L may have a local maximum, a local minimum or a saddle point. 
However, In L contains a determinant and a trace. Hence we need to develop some results 
on differentiating a determinant and a trace with respect to a matrix, and the following 
results will be helpful in this regard. 


Theorem 3.5.5. Let the p x p matrix Y = (yij) be nonsingular, the yij’s being distinct 
real scalar variables. Let f — |Y |, the determinant of Y. Then, 


oY 


ШТ _ \Y|(Y~!) for a general Y 
ИРУ! — diag(Y- 5] for Y = Y' 


where diag(Y-!) is a diagonal matrix whose diagonal elements coincide with those of 
P 
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Proof: A determinant can be obtained by expansions along any row (or column), the re- 
sulting sums involving the corresponding elements and their associated cofactors. More 
specifically, |Y| = yi1Ci1 +--+ + YipCip for each i = 1,..., p, where C;; is the cofactor 
of уг. This expansion holds whether the elements in the matrix are real or complex. Then, 


Cij for a general Y 
|Y| = 12CijforY = Yi z j 
Cj; for Y = Y',i = j. 


ду} 


Thus, xplY| = the matrix of cofactors = |Y|(Y~!)’ fora general Y. When Y = Y’, then 


Си 2С; ++ 2С 
ду _ 2С›у Co +++ 2С› 
oY : : =, : 

бл Эбш oe бы 


= |Y|[2Y^! — diag(Y !)]. 
Hence the result. 
Theorem 3.5.6. Let A and Y — (yij) be p x p matrices where A is a constant matrix 
and the yij's are distinct real scalar variables. Then, 


A' for a general Y 


À (АУ) = | 
oY А + A’ — diag(A) for Y = Y'. 


Proof: tr(AY) — 258 ajiyij for a general Y, so that xp [tr(Y)] = A’ for a general Y. 
When Y = Y', 5 [0(АУ)] = aj; and 5;-[t(AY)] = ау + aj; fori # j. Hence, 


ip[tr(AY)] = A+ A’ — diag(A) for Y = Y’. Thus, the result is established. 


With the help of Theorems 3.5.5 and 3.5.6, we can optimize L or In L with L as spec- 
ified in Eq. (3.5.4). For convenience, we take In L which is given by 


np n 1 m n - ra- 
lIn L = —5 Mn) — 5 nI2 — ju $)— z“ —pn)X (X — џи). (3.5.5) 
Then, 

д п 
—InL=O>0-- 
дш 2 
—nE (X-uy)20-2X-u-2O0 

>u=X, 


3 - E 
„== py 57!(Х-ш)=0 
ш 
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referring to the vector derivatives defined in Chap. 1. The extremal value, denoted with 
a hat, is й = X. When differentiating with respect to ©, we may take В = X7! for 
convenience and differentiate with respect to B. We may also substitute X to и because 
the critical point for X must correspond to й = X. Accordingly, In L at u = X is 


" np n 1 
In L(&, B) = —— n27) + -In |B| — —їг(В$). 
2 2 2 
Noting that В = B’, 


à 1 
zg In Li, B) 0 = 5028! — diag(B-)] — 5125 — diag(s)] = О 


= n[2X — diag(2’)] = 2S — diag(S) 
" 1 " Ll vs 

= 0g = spp get Fj 
| 

э (й=Х,Ё = —5). 


Hence, the only critical point is (Å, x ) = (X, 15 ). Does this critical point correspond to a 
local maximum or a local minimum or something else? For й = X, consider the behavior 
of In L. For convenience, we may convert the problem in terms of the eigenvalues of B. 


Letting Л, ..., Ар be the eigenvalues of B, observe that A; > 0, j = 1,..., р, that the 
determinant is the product of the eigenvalues and the trace is the sum of the eigenvalues. 
Examining the behavior of In L for all possible values of A; when 22, ..., Ар are fixed, we 


see that In L at й goes from —oo to —oo through finite values. For each А ;, the behavior 
of In L is the same. Hence the only critical point must correspond to a local maximum. 
Therefore й = X and y= 15 аге the maximum likelihood estimators (MLE's) of u 
and X respectively. The observed values of j and Š are the maximum likelihood esti- 
mates of u ара X, for which the same abbreviation MLE is utilized. While maximum 
likelihood estimators are random variables, maximum likelihood estimates are numerical 
values. Observe that, in order to have an estimate for 27, we must have that the sample size 
n> p. 

In the derivation of the MLE of X, we have differentiated with respect to B = X7! 
instead of differentiating with respect to the parameter 27. Could this affect final result? 
Given any Ө and any non-trivial differentiable function of 0, $(0), whose derivative is 
not identically zero, that is, X o(0) + 0 for any Ө, it follows from basic calculus that 
for any differentiable function g(@), the equations E g(0) = 0 and ДБ g(0) = 0 will lead 
to the same solution for 0. Hence, whether we differentiate with respect to В = X7! or 
У, the procedures will lead to the same estimator of X. As well, if Ê is the MLE of Ө, 
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then g() will also the MLE of g(0) whenever g(0) is a one-to-one function of 0. The 
numerical evaluation of maximum likelihood estimates for u and X has been illustrated 
in Example 3.5.1. 


3.54.3. МГЕ?” in the complex p-variate Gaussian case 


Let the p x 1 vectors in the complex domain X4, nts X. be па as Noli, У), X0. 
and let the joint density of the X js, j =1,...,n, be denoted by L. Then 


n e (Kj- ME g-B) e7 У 00-0" 10-й) 


z ? |det(X)| B л" |det(3)|" 


j=l 
g 17! -n- iy 10-й) 


z"P|det( X)|” 


where |аеѓ( 27) | denotes the absolute value of the determinant of X, 


k=1 


е E z ] - E z = = 
X = [X1,..., X4], d UE eer o X = [Х,...,Х], 


where X and Х аге р х п. Непсе, 


In Ë = —npInz —nin|det(Z)| — tr(E- 18) —n(X — à S-Ž- д). (3.5а4) 


3.5a.4. Matrix derivatives in the complex domain 


Consider tr(BS*), B = B* > О, $— 5* > O.Let B = Bii Bo, Š = 51+15, і = 
J (—1). Then Bı and 5; are real symmetric and Bz and S» are real skew symmetric since 


B and S are Hermitian. What is then a [tr(B S *)]? Consider 


BS* = (Ву +iBo)(S, — i S4) = B1S, + B25, + i(B8, — В), 
tr(BS*) = tr(B15, + B254) + i[tr(B28,) — tr(B1S5)]. 


It can be shown that when B» and S» аге real skew symmetric and B; and Sj are real 
symmetric, then tr(B2S;) = 0, tr(B $5) = 0. This will be stated as a lemma. 


192 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Lemma 3.5a.1. Consider two p x p real matrices A and B where A = A' (symmetric) 
and B = — B' (skew symmetric). Then, tr(A B) = 0. 


Proof: (AB) = tr(AB) = tr(B'A) = —tr(BA) = —tr(AB), which implies that 
tr(AB) = 0. 


Thus, tr( B $*) — tr(B, S; + B555). The diagonal elements of S; in tr(B, S1) are multi- 
plied once by the diagonal elements of B, and the non-diagonal elements in 5, are multi- 
plied twice each by the corresponding elements in В|. Hence, 


д В 
эвт (180 = 2S, — diag($1). 


In Bz and 52, the diagonal elements are zeros and hence 


д 
—tr(B2S5) = 285. 
35; (5255) 2 


Therefore 


à ә _ i 
(= ы -= JBS} + B254) = 2081 + i S2) — diag(S1) = 25 — diag(S). 
OB, д B5 


Thus, the following result: 


Theorem 3.5a.3. Let S = S* > O and B = B* > O be p x p Hermitian matrices. Let 
В = В. +i B2 and $ = $ + i S2 where the рхр н Bı and 3 are symmetric and 


B; and $5 are skew symmetric real matrices. Letting 2 17 р. tig = we have 


AEE р E 
—-tr(B$*) = 2S — diag(S). 
ОВ 5 


Theorem 3.5а.4. Let X = (б) = X* > О be a Hermitian positive definite p x p ma- 


trix. Let det( 27) be the determinant and |det(2’)| be the absolute value of the determinant 


respectively. Let 2 = a, +i 3 be the differential operator, where X = У + i Xh, 


і = /(-1), 2 being real symmetric апа X>, real skew symmetric. Then, 


Е) 
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Proof: Note that for two scalar complex quantities, X = xı +ix2 and y = yı +iy2 where 
i = J/(—1) and xi, x2, y1, y2 are real, and for the operator A = "S +i a. the following 
results hold, which will be stated as a lemma. 


Lemma 3.5a.2. Given x, y and the operator э + ig defined above, 


д аы 0 n 0 22 7 д FR : 
qo) =0) 5209) 20. 0") = 2), « (y) 2}, 
Ox Ox Ox ax 
ð __ à e. _ ol И _ 
3; 092 = 3409 = 2x, age (x"x) = aa (ХХ*) = 2x" 
where, for example, X* which, in general, is the conjugate transpose of x, is only the 


conjugate in this case since x is a scalar quantity. 


Observe that for a p x p Hermitian positive definite matrix X, the absolute value of the 


determinant, namely, |det(X)| = ,/det(X)det(X*) = det(X) = det(X*) since X = Х*. 
Consider the following cofactor expansion of det(X) (in general, a cofactor expansion 
is valid whether the elements of the matrix are real or complex). Letting C;; denote the 
cofactor of x;; in Х = (xij) when xj; is real or complex, 


det(X) = x11C11  xi2C12 + +++ + X1pCip (1) 
= x21C21 + X22C22 +--+ + X2pC2p (2) 


When X = X *, the diagonal elements х ;;’s are all real. From Lemma 3.52.2 and equation 
(1), we have 


д д 
— Cu) = Ci, с) = 0, j=2,..., p. 
TR (x11C11) 11 TT (x1; C1j) J p 
From Eq. (2), note that x2} = хү, C21 = CT, since X — X*. Then from Lemma 3.5a.2 
and (2), we have 


a д д 
—— CC.) С Сус — Со) 20, j = 3,..., р, 
duo b 12) 12 d m 22) 22 Эхэ} (x2jC2j) J P 


observing that хә, = x22 and C5, = Cz. Now, continuing the process with 
Eqs. (3), (4),..., (р), we have the following result: 


* © __ 
С J=1,...,p 


Xij 2С} for all i Æ j. 


= —ldet() = | 
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Observe that for 57! = В = B*, 


Dr SEL n Br 
9 . | |2B* Be... 2B 
[шей | x 9e Б: 
0B det(B) к : 
В ОВ cue Е 


= 287! — diag(B^!) 2 2X — diag( X) 


where B,, is the cofactor of brs B= (brs). Therefore, at 2 = X , for 2—1 = B, and from 
Theorems 3.5a.5 and 3.5a.6, we have 


5 А А А А » 
— [In L] = О = n[X — diag()] — [S — diag($)] = О 
0B 

" 1 ~ 5 ү 
> X = -S > X = -S forn > р, 
n n 
where a hat denotes the estimate/estimator. 


Again, from Lemma 3.5a.2 we have the following: 
д = От _ Q. x om m xu d s = EN = 45 
a; "Е = FARE NX + WE EU R4 WYK) = O 


30595 is) 95 72 20 

>й=Х. 
Thus, the MLE of ji and X are respectively ü = and 5 = is forn > p. It is not 
difficult to show that the only critical point (д, X = (X, 15) corresponds to a local 
maximum for L. Consider In L at й = X. Let Aj, ..+,Ap be the eigenvalues of В = S71 
where the A's are real as B is Hermitian. Examine the behavior of In Č when a А jis 
increasing from 0 to oo. Then In L goes from —oo back to —oo through finite values. 
Hence, the only critical point corresponds to a local maximum. Thus, X and 15 are the 
MLE’s of й and X, respectively. 
Theorems 3.5.7, 3.5a.5. For the p-variate real Gaussian with the parameters u and 
X > О and the p-variate complex Gaussian with the parameters б and X > O, the 
maximum likelihood estimators (MLE’s) are ù = X, y= 15, й = X, y= 1$ where 
n is the sample size, X and S are the sample mean and sample sum of products matrix in 
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the real case, and X and S are the sample mean and the sample sum of products matrix in 
the complex case, respectively. 


A numerical illustration of the maximum likelihood estimates of jz and X in the com- 
plex domain has already been given in Example 3.5a.1. 

It can be shown that the MLE of и and 27 in the real and complex p-variate Gaus- 
sian cases are such that E[X] = и, E[X] = pf; EES] = y, E[S] = 1 X. For 
these results to hold, the population need not be Gaussian. Any population for which the 


covariance matrix exists will have these properties. This will be stated as a result. 


Theorems 3.5.8, 3.5a.6. Let X1,..., Xn bea simple random sample from any p-variate 
population with mean value vector u and covariance matrix X = X > О in the real 
case and mean value vector р and covariance matrix X = X* > О in the complex 
case, respectively, and let X and X exist in the sense all the elements therein exist. Let 
X- (Х| +--+ Xn), Х = ИХ, ++ x) and let S and S be the sample sum of 
products matrices in the real and complex cases, respectively. Then E[X 1 = џи, Е [Х | = 
й, ЕХ] = E[S] = "15 > X asn оо апа Е[Х] = й, Е[Ў] = Е[18] = 


п 


"1 5) > Xasn оо. 


Proof: EX] = 1E[Xi] +--+ + Е[Х„]}= {uw +--+: +} = и. Similarly, E[X] = й. 
Let M = (и, ш, ..., и), that is, M isa p x n matrix wherein every column is the p x 1 
vector u. Let X = (X,..., X), that is, X isa p x n matrix wherein every column is X. 
Now, consider 


ЕХ - M)X - M/] = ЕУ (Xj - ШОХ - n] = 3 AX +--+ 5) = пх. 
je j=l 


As well, 
(X — M)(X — М) = (X—-X+X-—M)(X—-X+X-M)’ 
= (X — XX — Xy -- (X - X)(X - M) 
+ (K — М)(Х -X\+ (XK -MX -M > 


(X - МХ — M) = S+ (Х;— 0 и) + YK — w(x; — XY 
j=l j=l 


+) A-W- 
j=1 
=S+O+0+4n(X — p)(X – и) > 
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n= = E[S] + О + О +nCov(X) = віз] —E[s]4 355 


n 


x 1 — 1 
E[S] - n - DZ = Е[Ў]= E|-s| = * DEO ET 
n n 


Observe that У(Х f X) = О, this result having been utilized twice in the above 
derivations. The complex case can be established in a similar manner. This completes the 
proof. 


3.5.4. Properties of maximum likelihood estimators 


Definition 3.5.2  Unbiasedness. Let g(0) be a function of the parameter Ө which stands 
for all the parameters associated with a population's distribution. Let the independently 


distributed random variables x1, ..., x, constitute a simple random sample of size n from 
a univariate population. Let T (x1, ..., Xn) be an observable function of the sample values 
X], ..., Xn. This definition for a statistic holds when the iid variables are scalar, vector or 


matrix variables, whether in the real or complex domains. Then T is called a statistic (the 
plural form, statistics, is not to be confused with the subject of Statistics). If E[T] = g(0) 
for all Ө in the parameter space, then Т is said to be unbiased for g(@) or an unbiased 
estimator of е (Ө). 


We will look at some properties of the MLE of the parameter or parameters represented 
by 0 in a given population specified by its density/probability function f (x, Ө). Consider 
a simple random sample of size n from this population. The sample will be of the form 
X1, ..., Xn if the population is univariate or of the form X,,..., Xn if the population is 
multivariate or matrix-variate. Some properties of estimators in the scalar variable case 
will be illustrated first. Then the properties will be extended to the vector/matrix-variate 
cases. The joint density of the sample values will be denoted by L. Thus, in the univariate 
case, 


PSL) eT fees mf 


j=l j=l 


Since the total probability is 1, we have the following, taking for example the variable to 
be continuous and a scalar parameter 0: 


0 
f ixi $ | rax=0, X" = (Gi Loc 
X 90 Jx 
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We are going to assume that the support of x is free of theta and the differentiation can be 
done inside the integral sign. Then, 


à 173 
o= | тах = / -(z;L) LdX = / [ла]. 4Х. 
КЕТ: AC у 190 
Noting that f, (-)L dX = E[C)], we have 


n 


e[ sine] =0> [у ло, ө] =. (3.5.6) 


j=l 


Let 6 be the MLE of 6. Then 


ð ð 
59110-0 =0 => zz In Llo = 0 


“д 
> к| у = In fou. 00,4] = 0. (3.5.7) 
If 0 is scalar, then the above are single equations, otherwise they represent a system of 
equations as the derivatives are then vector or matrix derivatives. Here (3.5.6) is the like- 
lihood equation giving rise to the maximum likelihood estimators (MLE) of 0. However, 
by the weak law of large numbers (see Sect. 2.6), 


| x39 д 
2» эв In fj. log > Е| n Роу, 6) авл > оо (3.5.8) 


where 6, is the true value of 0. Noting that E [4 In f (xj, %)] = 0 owing to the fact that 
y d f (x)dx = 1, we have the following results: 


n 


д д 
X a5 In Ру 0064 = 0. E| 1n /(ху, Olo] = 0. 


This means that E [0] = b or E 10] — 0, as п — оо, that is, 0 is asymptotically unbiased 
for the true value 0, of Ө. As well, Ó — 6, as n — oo almost surely or with probability 
1, except on a set having probability measure zero. Thus, the MLE of 0 is asymptotically 
unbiased and consistent for the true value 05, which is stated next as a theorem: 


Theorem 3.5.9. In a given population’s distribution whose parameter or set of parame- 
ters is denoted by 0, the MLE of 0, denoted by 0, is asymptotically unbiased and consistent 
for the true value б. 
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Definition 3.5.3. Consistency of an estimator If Pr(0 — 00} > lasn — oo, then 
we say that 0 is consistent for 05, where 0 is an estimator for Ө. 


Example 3.5.2. Consider a real p-variate Gaussian population N5(u, X), X > О. 
Show that the MLE of и is unbiased and consistent for u апа that ће MLE of 27 is 
asymptotically unbiased for 27. 


Solution 3.5.2. We have f, = X = the sample mean or sample average and y= is 
where $ is the sample sum of products matrix. From Theorem 3.5.4, E[X] = и and 
Cov(X) = Гу; — О asn — оо. Therefore, й = X is unbiased and consistent for и. 
From Theorem 3.5.8, E$] = iy > X asn — oo and hence X is asymptotically 
unbiased for 27. 


Another desirable property for point estimators is referred to as sufficiency. If T is a 
statistic used to estimate a real scalar, vector or matrix parameter 0 and if the conditional 
distribution of the sample values, given this statistic T, is free of 0, then no more informa- 
tion about 0 can be secured from that sample once the statistic T is known. Accordingly, 
all the information that can be obtained from the sample is contained in T or, in this sense, 
T is sufficient or a sufficient estimator for 0. 


Definition 3.5.4. Sufficiency of estimators Let 0 be a scalar, vector or matrix parameter 
associated with a given population's distribution. Let T = T (X4,..., Xn) be an estimator 
of 0, where X,,..., X, are iid as the given population. If the conditional distribution of the 
sample values X1,..., Xn, given Т, is free of 0, then we say that this T is a sufficient es- 
timator for Ө. If there are several scalar, vector or matrix parameters 01, ..., Ө; associated 
with a given population and if Ту (X1, ..., Xn), ..., T;(X1,..., Xn) arer statistics, where 
r may be greater, smaller or equal to k, then if the conditional distribution of X1, ..., Xn, 
given Tj, ..., Tj, is free of 01, ..., Og, then we say that Tj, ..., T, are jointly sufficient for 
01,..., 0. If there are several sets of statistics, where each set is sufficient for 01, ..., Өр, 
then that set of statistics which allows for the maximal reduction of the data is called the 
minimal sufficient set of statistics for 01, . . . , Ox. 


Example 3.5.3. Show that the MLE of w ina №, (и, 27), X > O, is sufficient for џи. 
Solution 3.5.3. Let X1, ..., X, be a simple random sample from a №, (и, 27). Then the 


joint density of X1, ..., X, can be written as 
L- E g n7 9-587 Ep), D 
(27)? |x|? 
referring to (3.5.2). Since X is a function of X1,..., X,,the joint density of X1,..., Xn 


and X is L itself. Hence, the conditional density of X1,..., Xn, given X,isL /fi (X ) 
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where fi(X) is the marginal density of X. However, appealing to Corollary 3.5.1, f\(X) 
is Np(u, 1 X). Hence 


L 1 Е 
= n(p-l) p n-l eim СА Qi) 
ЛО) Оол) T п || 


which is free of u so that Å is sufficient for џи. 


Note 3.5.1. We can also show that й = X and X = is are jointly sufficient for u and 
X ina №(и, X), X > О, population. This results requires the density of 5, which will 
be discussed in Chap. 5. 


An additional property of interest for a point estimator is that of relative efficiency. If 
& (0) is a function of 0 and if T = T (x1, ..., Xn) is an estimator of g(0), then E|T — g(0)? 
is a squared mathematical distance between T and g(0). We can consider the following 
criterion: the smaller the distance, the more efficient the estimator is, as we would like this 
distance to be as small as possible when we are estimating g(@) by making use of T. If 
E[T] = g(0), then T is unbiased for g(@) and, in this case, E|T — g(0)? = Var(T), the 
variance of Т. In the class of unbiased estimators, we seek that particular estimator which 
has the smallest variance. 


Definition 3.5.5. Relative efficiency of estimators If 7; and 7» are two estimators of 
the same function g(0) of 0 and if ЕПТ — g(0)?] < Е[|7» — g(0)|?], then Тү is said 
to be relatively more efficient for estimating g(0). If T; and 7 are unbiased for g(@), the 
criterion becomes Var(T]) < Var(T5). 


Let u be an unbiased estimator of g(0), a function of the parameter 0 associated with 
any population, and let Т be a sufficient statistic for 0. Let the conditional expectation 
of u, given T, be denoted by A(T), that is, E[u|T] = h(T). We have the two following 
general properties on conditional expectations, refer to Mathai and Haubold (2017), for ex- 
ample. For any two real scalar random variables x and y having a joint density/probability 
function, 

Ely] = E[E(y|x)] (3.5.9) 


and 
Var(y) = Var(E[y|x]) + E[Var(y|x)] (3.5.10) 


whenever the expected values exist. From (3.5.9), 


g(0) = E[u] = ELE(u|T)] = E[h(T)] > E[h(T)] = g@). (3.5.11) 
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Then, 


Var(u) = E[u — g(0)P. = Var(E[u|T]) + E[Var(E[u|T])] = Var(h(T)) + 6, 6 > 0 
=> Var(u) > Var(h(T)), (3.5.12) 
which means that if we have a sufficient statistic Т for 0, then the variance of h(T), with 
h(T) = E[u|T] where и is any unbiased estimator of g(0), is smaller than or equal to 
the variance of any unbiased estimator of g(0). Accordingly, we should restrict ourselves 
to the class of A(T) when seeking minimum variance estimators. Observe that since ô 


in (3.5.12) is the expected value of the variance of a real variable, it is nonnegative. The 
inequality in (3.5.12) is known in the literature as the Rao-Blackwell Theorem. 


It follows from (3.5.6) that ES ln L] = e In L)L dX = 0. Differentiating once 
again with respect to 0, we have 


0= |, (ш) ах =о 
>f] (2) (Sinz) lax = 
=| ( (ine ) Lax =-— f (mn - 


д д 2 д2 
Var( — In L) = e|— nL] = -E|— nZ] 
90 90 90? 
à 92 
= пЕ| In fa. 2 = —nE| T5 


Let T be any estimator for 0, where 0 is a real scalar parameter. If T is unbiased for Ө, 
then E[T] = 0; otherwise, let E[T] = 0 + b(0) where b(@) is some function of 0, which 
is called the bias. Then, differentiating both sides with respect to 0, 


so that 


In f (xj. 6). (3.5.13) 


/ TLdX = 0 + b(0) = 
X 
10) = | [к ах b'(0) = S pg) 
~ Jx 0 | ШЕТ; 
= EIT In L)] =1+0'(0) 


д 
= Cov(T, — 36 — In L) 
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because Es In L] = О. Hence, 


д 2 __ / 2 д 
[Cov(T, 5918 DP = [1 + Ь'(Ө)]? < Var(T)Var( = In L) => 


/ 2 / 2 
Var(T) > [1 +Р (XI iiu (0)] 
Маг( Іа D) | nVar(;5 In f (xj, 0)) 
(HEP _ +OP 


_ EN LAO 3.5.14 
E[SInLP — nE[f In (ху, 6)P? UT 


which is a lower bound for the variance of any estimator for 0. This inequality is known 
as the Cramér-Rao inequality in the literature. When T is unbiased for Ө, then b’(0) = 0 


and then 1 1 
Var(T) > = 3.5.15 
ЫЛ теу ап) и 


where 


1,00) = Var( Žin L) = E[ Sin Ll - nE[ S m fa. e| 
д? 9? 
= -E| m г |= —nE| In fe. 2 nno (3.5.16) 


is known as Fisher's information about 0 which can be obtained from a sample of size n, 
1, (Ө) being Fisher's information in one observation or a sample of size 1. Observe that 
Fisher's information is different from the information in Information Theory. For instance, 
some aspects of Information Theory are discussed in Mathai and Rathie (1975). 


Asymptotic efficiency and normality of MLE's 
We have already established that 


д Р 
0— == In LOC, 6)l,. 5. @ 


which is the likelihood equation giving rise to the MLE. Let us expand (i) in a neighbor- 
hood of the true parameter value 6, : 


2 


9 ^ д 
0 = 35 І(Х, 0)lo=0, + (0 — 00) 393 In L(X, 0)|e—e, 


(6 —@,)? ӘЗ " 
= в In L(X, 0)|o—6, (ii) 
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where |0 — 011 < |Ê — bol. Multiplying both sides by ./n and rearranging terms, we have 
the following: 


| In L(X, 0)| 
Vn – 6) = КЕДЕ е - — (йй) 


2 
L3 nO ја p 10202 бл In L(X, 0)1o—, 


The second term in the denominator of (iii) goes to zero because 6 — 0, as n — оо, and 
the third derivative is assumed to be bounded. Then the first term in the denominator is 
such that 
2 1 n 92 
—— ln L(X, 0)lọ=9, = — 1 ,0 
155: (X, 0)|o—e, 39r n f (xj, 0)lo—e, 
92 
> E| Zam fao] = -nen, 


= ма = m 70.9), 


пва, = Уш In FED] _,, 


which is the information bound 7, (6,). Thus, 


1 92 
1552 DLS Ө)|ө=в„ — — 1 (80), (iv) 
and we may write (iii) as follows: 
Мп ix 
1 (Ө jus 1 i, Ө)|ө=ө„› 

УП, (бь)/п( ~ Titan 2-6 55 In / (xj, leze, (v) 
where x In f (xj, 9) has zero as its expected value and /;(0,) as its variance. Further, 
f(xj,0), j = 1,...,n are iid variables. Hence, by the central limit theorem which is 
stated in Sect. 2.6, 

Jn Tax 
In f(x;,0) — МОО, 1) as n — co. 3.5.17 
Tan 2496 fGj,0) > №00, 1) (3.5.17) 


where N4(0, 1) is a univariate standard normal random variable. This may also be re- 
expressed as follows since / (8%) is free of n: 


RL In f (xj, )lo=0, > NiO, 11(85)) 
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or 


VIO) /n(Ó — 0,) — N3(0, 1) аз п > оо. (3.5.18) 


Since Д (Ө) is free of n, this result can also be written as 


Jn(ó 6) > M (o (3.5.19) 


І 
mm) 


Thus, the MLE Ó is asymptotically unbiased, consistent and asymptotically normal, refer- 
ring to (3.5.18) or (3.5.19). 


Example 3.5.4. Show that the MLE of the parameter 0 in a real scalar exponential pop- 
ulation is unbiased, consistent, efficient and that asymptotic normality holds as in (3.5.18). 


Solution 3.5.4. As per the notations introduced in this section, 


E. 
NS 
е ¢, 05%; < œ, 0 > 0, 


f(xj,0) = 


=. ole 


1 
L=—e? Max. 


n 


Sb 


In the exponential population, E[x;] = 0, Var(x;) = 02, j =1,...,n, the MLE of 0 is 
6=x,x= Her +- + Xn) and Var(6) = E: — 0 аѕ п — оо. Thus, E[6] = 6 and 
Var(Ó) — Oasn — co. Hence, 6 is unbiased and consistent for 0. Note that 

1 E[xj] 1 1 


= 2 = = = | 
0212—93 = 02 = үа) 


1 9? 
In f(xj,0) =—Ind — zx; > -E| oS. O)| = 


Accordingly, the information bound is attained, that is, 6 is minimum variance unbiased 
or most efficient. Letting the true value of 0 be 0,, by the central limit theorem, we have 


X—6, уп —6,) 


= — N4(0,1)asn > oo, 


VVar(x) % 


and hence the asymptotic normality is also verified. Is 6 sufficient for 0? Let us consider 
the statistic и = x1 +--- + Xn, the sample sum. If u is sufficient, then х = Ө is also 
sufficient. The mgf of u is given by 


M,(t) = | [0-00 = 1-00", 1-612 0 = и ~ gamma(a = n, 8 = 0) 
jel 
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и"! _ 


whose density 15 ў (и) = FTE a, и = ху +--+: + xn. However, the joint density 


of x1,...,xX, is L = етв Gre), Accordingly, the conditional density of x1, .. . , Xn 
given 6 —xis 
i Pos 
Дш). wet’ 


which is free of 0, and hence 6 is also sufficient. 


3.5.5. Some limiting properties in the p-variate case 


The p-variate extension of the central limit theorem is now being considered. Let the 

р х 1 real vectors X,,..., X, be па with common mean value vector u and the common 
covariance matrix X > О, thatis, E(X;) = и and Cov(x;) = 2 > О, j=1,...,n. 

Assume that || X || < oo where ||(-)|| denotes a norm of (-). Letting Y; = X-2X. E[Y;] = 
1 = 

2 X, 


X-iu and Cov(Y;) = І, j = l,...,n, and letting X = Х| +++ XY = 
E(X) = u and E(Y) = X~? u. If we let 
U — 4n X-3(X — u), (3.5.20) 


the following result holds: 


Theorem 3.5.10. Let the p x 1 vector U be as defined in (3.5.20). Then, as n — оо, 


U > N,(0, D. 
Proof: Let L' = (а\,..., аь) be an arbitrary constant vector such that L'L = 1. Then, 
L'Xj, j =1,...,n, are iid with common mean L’ and common variance Var(L'X ;) = 


L'XL. Let Yj = z-ix; and uj = L'Y; = L'D-2X;. Then, the common mean of the 
uj's is L'X-i and their common variance is Var(u;) = I y-2y LLa 
1, j = L,..., n. Note that à = lui +-+- + un) = L'Y = L'E72X and that Var(i) = 
JH L = L, Then, in light of the univariate central limit theorem as stated in Sect. 2.6, we 


ls ; 
have /nL'3 2(X — u) — N4(0,1) as n — oo. If, for some p-variate vector W, L'W 
is univariate normal for arbitrary L, it follows from a characterization of the multivariate 
normal distribution that W is p-variate normal vector. Thus, 


U = /nE-*(X — и) > МЬ(О, 1) as n > оо, (3.5.21) 


which completes the proof. 
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A parallel result also holds in the complex domain. Let Х j J=1,...,n, be iid from 
some complex population with mean д and Hermitian positive definite covariance matrix 


X = X* > O where (| >| < oo. Letting X = i(X Lo Xn), we have 


Vn £-X(X — ñ) > N,(0, I) as n > oo. (3.5a.5) 


Exercises 3.5 


3.5.1. By making use of the mgf or otherwise, show that the sample mean X — i(X is 
c Х„) in the real p-variate Gaussian case, X; ^ Np(u, X), X > О, is again Gaussian 
distributed with the parameters и and ix. 

3.5.2. Let X; ~ Ny(u, X), X > О, j = L...,n and iid. Let X = (X1,..., Xn) 
be the p x n sample matrix. Derive the density of (1) tr(Z -l(X — М)(Х — M) where 
М = (џ,..., u) or a p x n matrix where all the columns are u; (2) tr(XX’). Derive the 
densities in both the cases, including the noncentrality parameter. 


3.5.3. Let the p x 1 real vector X; ~ N,(u, X), X > Oforj =1,...,n and lid. Let 
Х = (X1, ..., Xn) the р х п sample matrix. Derive the density of tr(X — X)(X — Xy 
where X = (X,..., X) 15 the p x n matrix where every column is X. 


3.5.4. Repeat Exercise 3.5.1 for the p-variate complex Gaussian case. 


3.5.5. Repeat Exercise 3.5.2 for the complex Gaussian case and write down the density 
explicitly. 

3.5.6. Consider a real bivariate normal density with the parameters u1, u2, o?, оў, р. 
Write down the density explicitly. Consider a simple random sample of size n, X1, ..., Xn, 
from this population where X; is 2 x 1, j = 1,...,n. Then evaluate the MLE of these 
five parameters by (1) by direct evaluation, (2) by using the general formula. 


3.5.7. In Exercise 3.5.6 evaluate the maximum likelihood estimates of the five parameters 
if the following is an observed sample from this bivariate normal population: 


ү Ped Es) UNE 


3.5.8. Repeat Exercise 3.5.6 if the population is a bivariate normal in the complex domain. 


3.5.9. Repeat Exercise 3.5.7 if the following is an observed sample from the complex 
bivariate normal population referred to in Exercise 3.5.8: 


Lb bae PB] 
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3.5.10. Let the p x 1 real vector X, be Gaussian distributed, X; ~ Np(O, I). Consider 
the quadratic forms иј = X, A1X1, u2 = X, A2X1. Let Aj = АЗ, j =1,2and A1 +42 = 
I. What can you say about the chisquaredness and independence of иј and u2? Prove your 
assertions. 

3.5.11. Let X4 ~ N5(O, I). Letu; = ХТА; ХІ, Aj = A‘, ј = 1,...,Е, Arte + 
Ax = I. What can you say about the chisquaredness and independence of the и ;’s? Prove 
your assertions. 

5.3.12. Repeat Exercise 3.5.11 for the complex case. 


5.3.13. Let X; ~ М, (и, X), X > О, j —1l,...,nandiid. Let X = (Xp eo Xn). 
Show that the exponent in the density of X, excluding -j namely, /n(X — u) X-!(X — 
ш) ~ Xo Derive the density of r(x’ Ex 

3.5.14. Let Q = In X7! (X — и) as in Exercise 5.3.13. Fora given o consider the 
probability statement Pr{Q > b} = a. Show that b = rom where Prix? > К =a. 


3.5.15. Let Q1 = /n(X — uo) E-!(X — uo) where X, X and и are all as defined in 
Exercise 5.3.14. If uo 4 џи, show that Qj ~ iG (A) where the noncentrality parameter 
A= $i — шо) EW" (u — шь). 
3.6. Elliptically Contoured Distribution, Real Case 

Let X be a real p х 1 vector of distinct real scalar variables with xj,...,xy as its 
components. For some p x 1 parameter vector B and p x p positive definite constant 
matrix A > O, consider the positive definite quadratic form (X — B)'A(X — B). We have 
encountered such a quadratic form in the exponent of a real p-variate Gaussian density, 
in which case В = и is the mean value vector and A = 2-!, X being the positive 


definite covariance matrix. Let g(-) > 0 be a non-negative function such that |A| 2 g((X — 
BY A(X — B)) > 0 and 


/ lA ge (X — ВУА(Х — В))ах = 1, (3.6.1) 
Х 


so that |A| 2g((X — B)'A(X — B)) is a statistical density. Such a density is referred to as 
an elliptically contoured density. 


3.6.1. Some properties of elliptically contoured distributions 


Let Y = A?(X — B). Then, from Theorem 1.6.1, dX = |A|~2dY and from (3.6.1), 


/ g(Y'Y)dY =1 (3.6.2) 
Ү 
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where 
Y'Y = y +y + Hyi, Ү' = (y1, oic 


We can further simplify (3.6.2) via a general polar coordinate transformation: 


уу =r 510 01 
yo =r cos, sin 0; 
уз = r COS 0] cos 0» sin 63 


Ур-2 =r cos] --- COS 05.3 sinO, 2 
Yp-1 = ғ c0s64---cos0p 5» SIN Өр—1 
ур =r с0801 -++ cos 0p. (3.6.3) 


for—5 <6; < 5, j=l,...,p—2, —л < 0р1 € л, 0 <r < оо. It then follows that 


дул... A dyp =r?! (cos 01)? +++ (соѕ0, 1) dr A dO, ^... л 0р1. — (3.64) 
Given (3.6.3) and (3.6.4), observe that r, 01, ...,0р- are mutually independently dis- 


tributed. Separating the factors containing 0; from (3.6.4) and then, normalizing it, we 
have 


7 | 5 | 
(cos 6;)?'~'d6; 21 = : | (cos 0;)? 140; = 1. (i) 
0 


=. 
2 


Let u = sin 0; > du = cos 0jd6;. Then (i) becomes 2 fol — u) T -1dy = 1, and letting 
v =u? gives (i) as 


1 | Sree 
| vila — fta = PE? d 
А re) 


Thus, the density of 60;, denoted by };(0;), is 


Г р—]+1 | 
fj(65) = КА ж ep, ар, (3.6.5) 
rÈr (EA 2 2 
and zero, elsewhere, for j = 1,..., p — 2, and 


1 
fp-1(85-1) = =. —л < Op-1 S T, (iii) 
2л 
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and zero elsewhere. Taking the product of the p — 2 terms in (ii) and (iii), the total integral 
over the 0;'s is available as 


p-2 „т 


2 р е 2 5 
П (oser =a} f 46.1 = LL. (3.6.6) 


—J 
2 


j=l 


The expression in (3.6.6), excluding 2, can also be obtained by making the transformation 
s = Y'Y and then writing ds in terms of dY by appealing to Theorem 4.2.3. 


3.6.2. The density of u = r° 
From (3.6.2) and (3.6.3), 


2л? f% 
=n | r?—| g(r) dr = 1, (3.6.7) 
5) Јо 
that is, 
оо Г(Р. 
2 | ге о) ; (iv) 
r=0 T 2 
Letting u — r?, we have 
оо Г(Р 
| и! g(u)du = a (v) 
0 л? 


and the density of r, denoted by }, (7), is available from (3.6.7) as 
2л? 
Г (5) 


fom = reler}, O< r < oo, (3.6.8) 
and zero, elsewhere. The density of u = r? is then 
р 
л? 
T 


fau) = u2—!g(u), 0 <и < oo, (3.6.9) 


and zero, elsewhere. Considering the density of Y given in (3.6.2), we may observe that 
у1,...› Yp are identically distributed. 


Theorem 3.6.1. Ify; = ruj, j = 1,..., р, in the transformation in (3.6.3), then 
Е[и?] = A Ј = 1,...,р, and ui, ..., up are uniformly distributed. 
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Proof: From (3.6.3), yj = ruj, j = 1,..., р. We may observe that и? Е gut = 
and that u1, ..., ир are identically distributed. Hence E[u?] + Е[из] +--+ Е[и5] = 
1 > Efu] = | 

Theorem 3.6.2. Consider the у; 'ѕ in Eq. (3.6.2). If g(u) is free of p and if E[u] < оо, 


then Е[у?] = XL otherwise, E[y7] = L E[u] provided E[u] exists. 


TP 
Proof: Since r and u; are independently distributed and since E [u4] = 2 in light of 
Theorem 3.6.1, Е[у?] = E[r?]E[u?] = LE[r?] = ; E[u]. From (3.6.9), 
B as r( 
/ u2—!g(u)du = a) (vi) 
0 л? 
However, 
лї [е , 1—1 
E[u] = и2 11-1 e(u)du. (vii) 
Г) Ju=0 


Thus, assuming that о (и) is free of p, that 5 can be taken as a parameter and that (vii) is 
convergent, 


1 лё Г(+) 1. 
PIG) лїї 2л 


2) _ 1 27 _ 1 a 
ас is do (3.6.10) 


otherwise, EL 7] = ; Elu] as long as E[u] < oo. 
3.6.3. Mean value vector and covariance matrix 
From (3.6.1), 

E[X] — А f X g((X — В) А(Х — B))dX. 

Noting that ш 
E[X] = E[X — В + B] = B + E[X — B] 

= В + VIE | « — B)g((X — B)'A(X — B))dX 

and letting Y = AX(X — B), we have 
E[X] — B+ | Year, —o < yj < œ, j= l,..., p. 


But Y’Y is even whereas each element in Y is linear and odd. Hence, if the integral exists, 
fy Yg(Y'Y)dY = О and so, E[X] = В = u. Let V = Cov(X), the covariance matrix 
associated with X. Then 
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1 


У = E[(X — Ш(Х – ш) = ACH | arvosanaria-t. 


where Y = AX(X — u), (3.6.11) 
y? yaya vc YYp 

yya |201 X» с ж» ds 
уруу Уруз c Ур 


Since the non-diagonal elements, y;yj, i з j, are odd and g(Y’Y) is even, the integrals 
over the non-diagonal elements are equal to zero whenever the second moments exist. 
Since E[Y] = О, V = E(YY’). It has already been determined in (3.6.10) that Ely] = 
z- for j = 1,..., p, whenever g (u) is free of p and E[u] exists, the density of u being as 
specified in (3.6.9). If g(u) is not free of p, the diagonal elements will each integrate out 
to s lr]. Accordingly, 


1 1 
Cov(X) = V = А! ог V = -E[r?]A"!. (3.6.12) 
2л р 


Theorem 3.6.3. When X has the p-variate elliptically contoured distribution defined 
in (3.6.1), the mean value vector of X, E[X] — B and the covariance of X, denoted by X, 
is such that X = lE[?2]A-! where A is the parameter matrix in (3.6.1), u — r? and r is 
defined in the transformation (3.6.3). 


3.6.4. Marginal and conditional distributions 
Consider the density 


/(Х) = lA eX — pn)’ A(X — nu), A > О, =œ < Xj «00, —CO < pj < oO 
(3.6.13) 


where X’ = (xi, ..., xp), W = (pui, .... Ap), A = (aij) > О. Consider the following 
partitioning of X, и and A: 


Xi ка) Ai Ар 
X == Я = , А — 
|| E bs. A» A2 
where X1, ш are pj х 1, X2, u2 are p2 x 1, A11 is pı x pı and Ao» is p2 х po, 
pı + p2 = p. Then, as was established in Sect. 3.3, 
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(X — ш)'А(Х — и) = Q = nay Ars (Xi — May) + 206 — noy An (GG — Hay) 
+ (Хә — шо)) A22(X2 — HQ) 
= (Ху — nay Ani — A45? Аз1](Х1 — ay) 
+ (X2 — bay + С) А2(Х2 – цо) + С), С = Аз; Аз1(Х1 ш; 


In order to obtain the marginal density of Х|, we integrate out X» from f(X). Let the 
marginal densities of X; and X» be respectively denoted by g;(X 1) апа g2(X2). Then 


1 
81(Х1) = a f g((Xı — way) lAn = AnA Ал1](Х1 = ua) 


X2 


+ (X2 — uy + С) Az (X2 — uo) + C))dX2. 
T 
Letting A2, (X2 — що) + C) = Y», dY? = [А22 dX? and 
1 1 = 
g1(X1) = Алг f g((X1 — way) [Au — Ai2A5) Anil (Xi — ua) + 050) ДҮ». 
Y? 


Note that |A| = |Ao2| |Ai11 — A12A5 Art| from the results on partitioned matrices pre- 


sented in Sect. 1.3 and thus, |А|2 [А2217 = [A11 — A1247) А2112. We have seen that X~! 
is a constant multiple of A where 27 is the covariance matrix of the p x 1 vector X. Then 


(211) = Xu- Ipin 0] 


which is a constant multiple of A11 — A1245) Азу. If Y; Y? = 52, then from Theorem 4.2.3, 


ау л? >” )d (3.6.14) 
) = ——— S g(s2 + u1) ds2 .0. 
ГИ Јо ° 


where иј = (X1 — ша) [А11 — Ai245) Ari (Xi — ua). Note that (3.6.14) is elliptically 
contoured or X, has an elliptically contoured distribution. Similarly, X? has an elliptically 
contoured distribution. Letting Yi; = (A11 — АА An)? (X1 — ua)), then Yj; has a 
spherically symmetric distribution. Denoting the density of Ү by g11(Y11), we have 


p 
5 P2 


2 
PUP) Jno 


1 
210111) = 85 g(s2 + Yi Yi1) 052. (3.6.15) 
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By a similar argument, the marginal density of Хэ, namely g2(X2), and the density of Y22, 
namely g22(Y22), are as follows: 


P1 


л? 
rap 


"2-1 E 
x J s? &(51+ (Xo — шоу) [Az — An Aq Aio](Xo — io) dsi, 
51 >0 


P1 
2 РІ 


л -1 
re) к g(s1 + Y25Yo2) ds. (3.6.16) 
2 $1> 


? i 
£(X2) = [А22 — An Aj A121? 


g22(Y22) = 


3.6.5. The characteristic function of an elliptically contoured distribution 


Let T bea p x 1 parameter vector, T" = (t1, ..., tp), so that T'X = tix +: 4 fpxp. 
Then, the characteristic function of X, denoted by ¢x(T), is E [ei TX ] where E denotes 
the expected value andi = y (— 1), that is, 


$x(T) = Ele" *] = | e! X|A[ eX — W'A(X — u)yx. (3.6.17) 

Writing X as X — u + ш and then making the transformation Y = А? (X — u), we have 
éx(T) = eT" [ e T4 ?Y (үгү)удү, (3.6.18) 
However, g(Y'Y) is invariant under orthonormal transformation of the type Z = 


PY, PP’ = I, P'P = I, as Z'Z = Ү'Ү so that g(Y'Y) = g(Z'Z) for all orthonor- 
mal matrices. Thus, 


= 4 ; / ad / 
bx(T) z da i е' ТА "PST nz (3.6.19) 
Z 


for all P. This means that the integral in (3.6.19) is a function of (T' A-3)(T' A73) = 
Т'А-ІТ, say y (T' A-! T). Then, 


ox(T) = į Ty (T'AT) (3.6.20) 


where AT! is proportional to X, the covariance matrix of X, and 


д 
3r 9x Dr=0 — iu => Е(Х) = ш; 
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the reader may refer to Chap. 1 for vector/matrix derivatives. Now, considering $x (T), 
we have 


9 _ 0 /А—1 — a! —1 —1 
57 91-40) = gu VUA T)—-y'(TA T)A T 


д ral гет! д 1 ral 
W(T'A'T) = V(T'A-T2T'A 


= oT’ 


V(T'A IT) =WTA!T)IQA'TNT'A')4+wW(TA'T)2A7! 
д 


эту T^ То 2247, 


assuming that w/(T’A7!T)|r=9 = 1 and y"(T'A-!T)|r-o = 1, where y'(u) = 
3 y (u) for a real scalar variable и and V/"(u) denotes the second derivative of y with 
respect to u. The same procedure can be utilized to obtain higher order moments of the 
type E[--- X XX X'] by repeatedly applying vector derivatives to фх (T) аз... E 3m op- 
erating on фх(Т) and then evaluating the result at T = О. Similarly, higher order central 
moments of the type E[ --- (X — w)(X — Ш) (Х — )(X — uy] are available by applying 
the vector differential operator - - - i ES on V (T' A-! T) and then evaluating the result at 
T = O. However, higher moments with respect to individual variables, such as E [x ], are 
available by differentiating $x (Т) partially k times with respect to t;, and then evaluating 
the resulting expression at 7 — O. If central moments are needed then the differentiation 
is done on y (T'A-1T). 


Thus, we can obtain results parallel to those derived for the p-variate Gaussian distribu- 
tion by applying the same procedures on elliptically contoured distributions. Accordingly, 
further discussion of elliptically contoured distributions will not be taken up in the coming 
chapters. 


Exercises 3.6 


3.6.1. Let x1, ..., xy be independently distributed real scalar random variables with den- 
sity functions (ху), j = 1,...,k. If the joint density of x1,..., xy is of the form 
fit): ++ fixi) = g(x; ++ х0) for some differentiable function g, show that 
X1, ..., Xk are identically distributed as Gaussian random variables. 


3.6.2. Letting the real scalar random variables x1, ..., x; have a joint density such that 
Ts suu Xk) = c for x? Tee + x7 « r?, r > 0, show that (1) (x1, ..., xy) is uniformly 
distributed over the volume of the k-dimensional sphere; (2) E[x;] = 0, Cov(xi, xj) = 
0, iz j —1,...,k; (3) xy, ..., xx are not independently distributed. 
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3.6.3. Letu = (X — B)'A(X — B) in Eq. (3.6.1), where A > О and A is a p x p matrix. 
Let g(u) = с|(1—аи)°, a > 0, 1—au > О апа c is an appropriate constant. If |A[ gu) 
is a density, show that (1) this density is elliptically contoured; (2) evaluate its normalizing 
constant and specify the conditions on the parameters. 

3.6.4. Solve Exercise 3.6.3 for g(u) = c2(1 +a и) ^, where c» is an appropriate constant. 


3.6.5. Solve Exercise 3.6.3 for g(u) = сзи? (1 —au)?-!, a > 0, 1— аи > 0 and c3 
is an appropriate constant. 


3.6.6. Solve Exercise 3.6.3 for g(u) — c4 и? le^". a > 0 where c4 is an appropriate 
constant. 


3.6.7. Solve Exercise 3.6.3 for g(u) = csuY-!(1 + au) ?*Y), a > 0 where cs is an 
appropriate constant. 


3.6.8. Solve Exercises 3.6.3 to 3.6.7 by making use of the general polar coordinate trans- 
formation. 


3.6.9. Lets = у? +-+ у? where уу, j = 1,..., р, are real scalar random variables. 
Let dY = dy; ^ ... ^ dy, and let ds be the differential in s. Then, it can be shown that 


dY = 52—145. By using this fact, solve Exercises 3.6.3—3.6.7. 


p 
л? 
Г(5) 

3222 : - "M 2: 
3.6.10. If A= "E write down the elliptically contoured density in (3.6.1) explicitly 
by taking an arbitrary b = E[X] = ш, if (1) g(u) = (a-cu)?,a-0,c-»0, a—cu > 0; 
(2) g(u) = (а + си) ё, a> 0, с> 0, and specify the conditions on о and f. 
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Chapter 4 ff) 
The Matrix-Variate Gaussian Distribution hen for 


4.1. Introduction 


This chapter relies on various results presented in Chap. 1. We will introduce a class 
of integrals called the real matrix-variate Gaussian integrals and complex matrix-variate 
Gaussian integrals wherefrom a statistical density referred to as the matrix-variate Gaus- 
sian density and, as a special case, the multivariate Gaussian or normal density will be 
obtained, both in the real and complex domains. 


The notations introduced in Chap. 1 will also be utilized in this chapter. Scalar vari- 
ables, mathematical and random, will be denoted by lower case letters, vector/matrix 
variables will be denoted by capital letters, and complex variables will be indicated by 
a tilde. Additionally, the following notations will be used. All the matrices appearing in 
this chapter are p x p real positive definite or Hermitian positive definite unless stated 
otherwise. X > О will mean that that the p x p real symmetric matrix X is positive 
definite and X > O, that the p x p matrix X in the complex domain is Hermitian, that 
is, X — X* where X* denotes the conjugate transpose of X and X is positive definite. 
О < А < X < B will indicate that the p x p real positive definite matrices are such that 
А> О, В> О, Х > О, Х-А > О, B-X > O. fy f(X)dX represents a real-valued 
scalar function f(X) being integrated out over all X in the domain of X where dX stands 
for the wedge product of differentials of all distinct elements in X. If X = (хуу) is a real 
pxq matrix, the x;;'s being distinct real scalar variables, then dX = ах Adx12/...AdX pq 
ordX — 2 + ^a dx;j.If X = X', that is, X is a real symmetric matrix of dimension 
p x p, then dX = ^f. j- 1 ij = А майл, which involves only p(p + 1)/2 differential 
elements dx;;. When taking the wedge product, the elements x;;'s may be taken in any 
convenient order to start with. However, that order has to be maintained until the com- 
putations are completed. If Х = Х 1 + iXo, where X, and X» аге real p х q matrices, 
i = /(-1), then dX will be defined as dX = dX; ^ dX». ЛС f(X)dX represents 


the real-valued scalar function f of complex matrix argument X being integrated out over 
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all p x p matrix X such that A > О, X > O, B > О, Х- А> О, B— X > О (al 
Hermitian positive definite), where A and B are constant matrices in the sense that they are 
free of the elements of X. The corresponding integral in the real case will be denoted by 
fixas FAX = f? /(Х)аХ, А > O,X>0O, Х- А> О, В> О, B-X>O, 
where А апа В are constant matrices, all the matrices being of dimension р х р. 


4.2. Real Matrix-variate and Multivariate Gaussian Distributions 


Let X = (xij) be a p x q matrix whose elements xj; are distinct real variables. For 
any real matrix X, be it square or rectangular, tr(X X’) = tr(X’X) = sum of the squares 
of all the elements of X. Note that XX’ need not be equal to X'X. Thus, tr(X X) = 


uw d xj and, in the complex case, tr(X X*) = У? , 23 [5:312 where if X, 


Xrs1 + iXps2 where x,,; and x,4,5 are real, i = ./(—1), with |X,,| = x24 + x2] 
Consider the following integrals over the real rectangular p x q matrix X: 


, р q«i ee 2 
nh2]eu0ax = | e Xi ах = e ах; 
J 
X X ij —oo 


-[Ivv-7*. © 
hj 


І 
2 . 


ps / e AX ay = (2л). (ii) 
X 


Let A > O be p x p and B > О beg x q constant positive definite matrices. Then we can 
define the unique positive definite square roots A? and B?. For the discussions to follow, 
we need only the representations A — A 1A}, B= В| В with A; and B, nonsingular, а 
prime denoting the transpose. For ап т х n real matrix X, consider 


tr(AX BX’) = tr(A2 A? X B3 B2 X) = tr(A2X B3 B2 X A3) 
= t(YY^, Y = A?X B? (iii) 


In order to obtain the above results, we made use of the property that for any two matrices 
Р and О such that PQ and QP are defined, tr(P Q) = tr(QP) where PQ need not be 
equal to OP. As well, letting Y = (yij), tY Y) = У, en у. YY’ is real positive 
definite when Y is p x q, p < q, is of full rank p. Observe that any real square matrix U 
that can be written as U — V V' for some matrix V where V may be square or rectangular, 
is either positive definite or at least positive semi-definite. When V isa p x q matrix, 
q = p, whose rank is p, VV’ is positive definite; if the rank of V is less than p, then V V" 


is positive semi-definite. From Result 1.6.4, 
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Y = AX B? > dY = |А|?|В|?аХ 
= dX = |A[$|B|-5dY (iv) 


where we use the standard notation |(-)| = det(-) to denote the determinant of (-) in 
general and |det(-)| to denote the absolute value or modulus of the determinant of (-) in 
the complex domain. Let 


А2182 _1 
БОЕ aM ASO. Bo (4.2.1) 
(27) 2 

for X = (xij), —oo < xij < oo for all i and j. From the steps (i) to (iv), we see that 
fp,4 (X) in (4.2.1) is a statistical density over the real rectangular p х q matrix X. This 
function fj, (X) is known as the real matrix-variate Gaussian density. We introduced a 
1 in the exponent so that particular cases usually found in the literature agree with the 
real p-variate Gaussian distribution. Actually, this 1 factor is quite unnecessary from a 
mathematical point of view as it complicates computations rather than simplifying them. 
In the complex case, the factor 1 does not appear in the exponent of the density, which is 
consistent with the current particular cases encountered in the literature. 


Note 4.2.1. If the factor i is omitted in the exponent, then 2л is to be replaced by z in 


the denominator of (4.2.1), namely, 


Al? |B? 
Р(Х) = AEB getan) А> О, B » О. (4.2.2) 
(т)? 
When р = 1, the matrix X is 1 x q and we let X = (x1, ..., x4) where X is а row vector 
whose components аге x1, ..., хи. When p = 1, Ais 1 x І ora scalar quantity. Letting 
A = land B = Ү-!, V > O, be of dimension q X q, then in the real case, 


1 
РЕТ 
fig(X) = 7—,—e ЕО 


— 


eee (4.2.3) 
(27)? |V|? 


which is the usual real nonsingular Gaussian density with parameter matrix V, that is, X’ ~ 
N,(O, V). If a location parameter vector и = (ш, ..., Hq) is introduced or, equivalently, 
if X is replaced by X — и, then we have 

fig(X) = [Ол) V2 eATX- y > 0, (4.2.4) 


On the other hand, when д = 1, a real p-variate Gaussian or normal density is available 
from (4.2.1) wherein В = 1; in this case, X ^ М, (и, AT!) where X and the location 
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parameter vector u are now p x 1 column vectors. This density is given by 


1 
А|2 ; 
fpi(X) = 2a, aa ACD) до (4.2.5) 
2л)? 
Example 4.2.1. Write down the exponent and the normalizing constant explicitly in a 
real matrix-variate Gaussian density where 


| [хі X12 Хз БЕРИ. 0 —1 
pe X22 F gxi- м = |_\ -2 0 | 


1 11 
A= l AR В=|1 2 1], 
1 1 3 
where the x;;’s are real scalar random variables. 


Solution 4.2.1. Note that A = A’ and B = B’, the leading minors in A being |(1)| = 
1 > O and |A| = 1 > Oso that A > О. The leading minors in В are |(1)| = 1 > 


0, 


ві= 0 3 


2 1 1 1 2 
1 -of |+ Ф| Jazi 


and hence B > O. The density is of the form 


1 р 


л) 2 
3 02)5 
where the normalizing constant is ( e = Өт = nr Let X, and X» be the two rows 
Ол) 2- 
Y 
of X and let Y = X — M = | Then Yı = (уп, у12, y13) = (хи — 1, x12, xis + 1), 
2 
Ү = (ya1, y22, Y23) = (x21 + 1, X22 + 2, x23). Now 


/ / 
(X — M)B(X — My — F Y, BY, e | 


|o yl. 
| = pe ҮВҮ; 


nc Td X YBYL 27 
сш; wy =|; 2||Y2BY!} Ү›ВҮ) 


__[ВҮ!+Ү›ВҮ! Yi BY] + BY! 
T LYLBYL +2Y2BY! Y,BYS+2Y2BY!|° 
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Thus, 


tr[A(X — M)B(X — My] = Yı BY| + YoBY/ + Yı BY; + 2Ү›ВҮ! 
= Yı BY! + 2Y; BY} + 2Y2BY}, = О, (i) 


noting that Y; BY, and Y» BY, are equal since both are real scalar quantities and one is the 
transpose of the other. Here are now the detailed computations of the various items: 


Yi BY, = 3n + 2y11y12 + 2911913 + 2y% + 2y12y13 + 3у?, (ii) 
YoBY5 = уз + 2y21y22 + 2521503 + 2y25 + 2y22923 + 3933 (iii) 
Yi BY; = уиул + yuyo + yu» + уу + 2у12у22 + y12y23 

+ у1зу21 + yi3y22 + 3у1зузз (iv) 


where the y;;'s and y2;’s and the various quadratic and bilinear forms are as specified 
above. The density is then 


$400 = = e 301 ВҮГ+2у1Вү;+0Вүз) 
' 4л 


where the terms in the exponent are given in (ii)-(iv). This completes the computations. 


4.2a. The Matrix-variate Gaussian Density, Complex Case 


In the following discussion, the absolute value of a determinant will be denoted by 
|det(A)| where A is a square matrix. For example, if det(A) = a + ib with a and b real 
scalar апа i = ./(—1), the determinant of the conjugate transpose of A is det(A*) = 
a — ib. Then the absolute value of the determinant is 


Idet(A)| = +y (a? + b?) = +[(a+ib)(a—ib)]? = 4[det(A)det(A*)]2 = +{det(AA*)]2. 

(4.2a.1) 
The matrix-variate Gaussian density in the complex case, which is the counterpart to that 
given in (4.2.1) for the real case, is 


|det(A) |4 |det( B)|? Ak BR") 


fpa 00 = Е (4.24.2) 


fr A > 0O, В> О, X= (Xij), ІС) denoting the absolute value of (-). When p = 1 and 
A = 1, the usual multivariate Gaussian density in the complex domain is obtained: 


си ind d t B Y Y * ~ ~ 
fia (X) - ee Oe Pubs Ng (i, Bo) (4.20.3) 
л 
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where B > O and X and are 1 x q row vectors, u being a location parameter vector. 
When q = 1 in (4.2a.1), we have the p-variate Gaussian or normal density in the complex 
case which is given by 


~ ~ d t A V * v ~ a 
for) = ON dowd, js Ra A7) (4224) 
л 


where X and the location parameter also denoted by jz are now p x 1 vectors. 


Example 4.2a.1. Consider a 2 x 3 complex matrix-variate Gaussian density. Write down 
the normalizing constant and the exponent explicitly if 


y; Хі Xm з oe Е -i 1+ 
te X22 3). айй = |0 1-і 1 |. 
А 4 1+i i 
fa E =|1-i 2 def. 
—i l+i 3 


where the x;;’s are scalar complex random variables. 


Solution 4.2a.1. Let us verify the definiteness of A and B. It is obvious that A 


A*, В = B* and hence they are Hermitian. The leading minors of A are |(3)| = 3 > 
0, |A| = 4 > 0 and hence A > О. The leading minors of В are |(4)| = 4 > 
4 1+i 
0, 1; ) | =6>0, 
2 1—1 „1-1 1-1 1—1 2 
iBi- 4| 2; Aa 2l 5 E "MEL 


and hence B > O. The normalizing constant is then 


|det(A)|4|det(B)|? — (45(8) 
mPa o mb ` 


Let the two rows of X be X; and X. Let (X — M) = Y = Н 
2 


Ў = (ўи, 3i 913) = Gui — i, йо + i, Xi — (1 + i)) 


Yo = ($21, 322. 323) = G1, 322 — (1 — i), бз — 1). 
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В - йу = si = вт, Ў 
2 


_ [Ý BY" YiBY; 
О LY BY# YoBY;] 
Then, 
FECE 3 1+і| [УВГ Ý BÝ* 
tr[A(X — M)B(X — My] =t E E oe 
Lat В 1 “к | BY? 
= 3Y, BY} + (1+i)(Y2BY}') + (1 — (Yi BY) + 2% BY3 
=Q (i) 
where 
Ýi BÝŤ = 451154) +27075 + 39135] 
+ (1+ ўи + 1511573 + A – оў 
+ (1 — Dom — 191397, + A+ D513 90 (ii) 


Y,BY; = 4921931 + 2322322 + 3323354 
+ (1 + i) 521939 + 1521533 + (1 — i) 52259) 
+ (1 — i) 2 ўз — 192399, + (1 + 1) ¥23 N99 (iii) 
YiBY7 = 451153) + 25125% + 391335 
TO + i) ўт ў + 1911993 + (1 0) 1251 
+ (1 — i) 912593 — 1513599, + (1 + Dis» (v) 
Ý BÝ ү = (iv) with уту and y»; interchanged. (v) 


Hence, the density of X is given by 


E E 4? 82 
fads x 0 


where О is given explicitly in (7)-(v) above. This completes the computations. 


224 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


4.2.1. Some properties of a real matrix-variate Gaussian density 


In order to derive certain properties, we will need some more Jacobians of matrix 
transformations, in addition to those provided in Chap. 1. These will be listed in this sec- 
tion as basic results without proofs. The derivations as well as other related Jacobians are 
available from Mathai (1997). 


Theorem 4.2.1. Let X bea p xq, q = p, real matrix of rank p, that is, X has full rank, 
where the pq elements of X are distinct real scalar variables. Let X = TU, where T is 
a p x p real lower triangular matrix whose diagonal elements are positive апа U, is a 
semi-orthonormal matrix such that ИШ = Ip. Then 


р 
dx = }][]1 ратл) (4.2.6) 
j=l 
where (О) is the differential element corresponding to U4. 
Theorem 4.2.2. For the differential elements h(U,) in (4.2.6), the integral is over the 


Stiefel manifold Ур, or over the space of p x q, q = p, semi-orthonormal matrices and 
the integral over the full orthogonal group O when q — p are respectively 


2 
Prt Pr T 
/ h(U4) = gy and / h(U4) = D (4.2.7) 
Үр. Г) Op D») 
where Г.(о) is the real matrix-variate gamma function given by 
Гуо) =a" 5 Pa) (a – 1/2) (e - (p - 0/2, Ha) > 251, (428) 


R(-) denoting the real part of (-). 


For example, 
I3(a) = т Tore — 1/2) Г(а — 1) = л? l'(o)I (a — 1/2) (a — 1), (o) > 1. 
With the help of Theorems 4.2.1, 4.2.2 and 1.6.7 of Chap. 1, we can derive the follow- 
ing result: 


Theorem 4.2.3. Let X реа real p x q, q 2 p, matrix of rank p and S = X X'. Then, 
S > O (real positive definite) and 
Pq 
m2 q_ p+! 
dX = ——_|S|2" 2 dS, (4.2.9) 
D»(2) 


after integrating out over the Stiefel manifold. 
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4.2a.1. Some properties of a complex matrix-variate Gaussian density 
The corresponding results in the complex domain follow. 


Theorem 4.2а.1. Let X be a p x 4, q = p, matrix of rank p in the complex domain and 
T bea p x p lower triangular matrix in the complex domain whose diagonal elements 
[jj > 0, j= 1,..., р, are real and positive. Then, letting Ü, be a semi-unitary matrix 
such that 010% = Ip, 


р ; ~ ~ ~ 
X 2 TU, 2 aX = 1[] ан dT А001) (4.2а.5) 
j=l 


where h(U 1) is the differential element corresponding to U 1. 


When integrating out Й(ЁЛ), there are three situations to be considered. One of the 
cases is g > p. When q = p, the integration is done over the full unitary group Di 
however, there are two cases to be considered in this instance. One case occurs where all 
the elements of the unitary matrix Uj, including the diagonal ones, are complex, in which 
case Ó p Will be denoted by OW, and the other one, wherein the diagonal elements of U 1 
are real, in which instance the unitary group will be denoted by oF When unitary trans- 
formations are applied to Hermitian matrices, this is our usual situations when Hermitian 
matrices are involved, then the diagonal elements of the unique U 1 are real and hence the 
unitary group is об. The integral of (071) under these three cases are given in the next 
theorem. 


Theorem 4.2a.2. Let h(U 1) be as defined in equation (4.2a.5). Then, the integral of 
(О), over the Stiefel manifold Ур, of semi-unitary matrices for q > р, and when gq = p, 
the integrals over the unitary groups O iS and о? are the following: 


"EN AP; P4 
| h(U1) = = ;u р; 
Vp.q Гу(4) 


"T PaP - тРФ—1) 
J, МО) = —— —, [ , HU) = = (4.2a.6) 
ot Dp) 4/0? Гр(р) 


the factor 2? being omitted when Ü, is uniquely specified; OF? is the case of a general X, 


O 1 is the case corresponding to X Hermitian, and АА (0) is the complex matrix-variate 
gamma, given by 


p(p 


T PO» —1)+Fe@—p +), ta) > p—1. (4.24.7) 


Г, (а) = л 
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For example, 

D(a) = л r(a)r (o — Dr'(a —2) = :?r(a)I (a — DI (a — 2), Ræ) > 2. 
Theorem 4.2a.3. Let X рер ха, q > p, matrix of rank р in the complex domain and 
S = XX* > О. Then after integrating out over the Stiefel manifold, 

Pq 


dX = I ^ |ә) ға. (4.24.8) 
Inq 


4.2.2. Additional properties in the real and complex cases 


On making use of the above results, we will establish a few results in this section 
as well as additional ones later on. Let us consider the matrix-variate Gaussian densities 
corresponding to (4.2.1) and (4.2a.2) with location matrices M and M , respectively, and 
let the densities be again denoted by f, , (X) and Ô Pq (X) respectively, where 


а pP 
Ў) = и ptlA(X—M) B(X— М) (4.2.10) 
л)? 
апа 


т Pq 


Then, in the real case the expected value of X or the mean value of X, denoted by Е(Х), 
is given by 


BO) = | Xfpa 000X = | X - ts ooax + М | ладах (i) 


The second integral in (i) is the total integral in a density, which is 1, and hence the second 
integral gives M. On making the transformation Y = А? (X —M )B?, we have 


E[X] 2 М+ A7? J Yen 2tY gy B72., (ii) 
Y 


(0л)? 


But tr(Y Y^) is the sum of squares of all elements in Y. Hence Үе zY’) is an odd function 
and the integral over each element in Y is convergent, so that each integral is zero. Thus, 
the integral over Y gives a null matrix. Therefore E(X) = M. It can be shown in a similar 
manner that E (X ) = М. 
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Theorem 4.2.4, 4.2a.4. For the densities specified in (4.2.10) and (4.2a.9), 
E(X) = Mand E(X) = M. (4.2.11) 
Theorem 4.2.5, 4.2a.5. For the densities given in (4.2.10), (4.2a.9) 
E[(X — M)B(X — My] = qA !, E[(X — M)'A(X — M)] = pB! (4.2.12) 
and 
E[(X — M)B(X — M)*]=qA~', E(X — M)*A(X – M)] = pB™!. (4.2а.10) 


Proof: Consider the real case first. Let Y = A?(X = М)В? = A-2Y = (Х — M)B?. 
Then 


E[(X — M)B(X — М) = 


(Ол)? 


J YY'e- 2" OV gy A72, (i) 
y 

Note that Y is pxq and Y Y' is px p. The non-diagonal elements in Y Y' are dot products of 
the distinct row vectors in Y and hence linear functions of the elements of У. The diagonal 
elements in Y Y' are sums of squares of elements in the rows of Y. The exponent has all 
sum of squares and hence the convergent integrals corresponding to all the non-diagonal 
elements in Y Y' are zeros. Hence, only the diagonal elements need be considered. Each 
diagonal element is a sum of squares of q elements of Y. For example, the first diagonal 
element in Y Y' is УЙ + у Tec Vig where У = (yij). Let Yı = (yii, ..., Vig) be the 
first row of Y and let s = УУ = y4 Tec Vig: It follows from Theorem 4.2.3 that 
when p = 1, 


55-145. (ii) 


Then 


1 / o л? 
| YiYje 2" "dy, = / s- 
Yi s=0 Г(5) 


The integral part over s is 2$*! r($ + 1) = 25114 7(2) = 2$qr ($). Thus Г(4@) is 


А А (0—14 . А Р 
canceled and (Ол)? cancels with (Ол)? leaving (2л) ^2 in the denominator and qin 
the numerator. We still have p — 1 such sets of q, Ys in the exponent in (i) and each such 


4 | _1 —- 
5216—2845. (iii) 
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(р—1)4 


integrals is of the form a en 2 dz = ,/(27) which gives (277) 2 and thus the factor 
containing л is also canceled leaving only q at each diagonal position in YY’. Hence the 


= p y YY’ e- iU Y)JY = qI where J is the identity matrix, which establishes 
л)? 


one of the results in (4.2.12). Now, write 


integral 
(2 


tr[A(X — M)B(X — My] = tr[(X — M)'A(X — M)B] = tr[B(X — MY A(X — М)]. 


This is the same structure as in the previous case where B occupies the place of A and 
the order is now q in place of p in the previous case. Then, proceeding as in the deriva- 
tions from (1) to (iii), the second result in (4.2.12) follows. The results in (4.2a.10) are 
established in a similar manner. 


From (4.2.10), it is clear that the density of Y, denoted by g(Y), is of the form 


1 ; 

g(Y)- Ра си, Y = (у), —00 < yij < oo, (4.2.13) 
Qr)? 

for all i and j. The individual y;;'s are independently distributed and each y;; has the 


density 
1 


ij Vij) = ———€ 
8ij Vij) Ол) 
Thus, we have a real standard normal density for y;;. The complex case corresponding 
to (4.2.13), denoted by g(Y), is given by 


—ly2 . 
1%), —oo < уу < oo. (iv) 


~ 1 y y* 
g(Y)- € E r), (4.2a.11) 
л 
In this case, the exponent is tr(YY*) = У? , 2.3 [Jij |? where Jrs = yrs +iyrs2, Yrst; 


Yrs? real, i = J/(—1) and |Jrs? = уд + Yoyo 

For the real case, consider the probability that у;; < tij for some given fj; and this is the 
distribution function of y;j, which is denoted by Р,, (tij). Then, let us compute the density 
of yrs Consider the probability that y? < u, u > Q for some u. Let uij = 35. Then, 


Pr{ujj < vij) for some vjj is the distribution function of u;; evaluated at о; ;, denoted by 
Fui; ij). Consider 


Рт{ур <1,1 > 0} = Pr{lyij| < Nt] = Pr(—4/t < Vij < vt} = Ру (Vt) Еу, (0). 

(v) 
Differentiate throughout with respect to t. When Pr{ уб, < t) is differentiated with respect 
to t, we obtain the density of uj; = у, evaluated at t. This density, denoted Бу h;; (uij), 
is given by 
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d d 
hij (иі) = a Du МЭ = т Аа? 


— p(y; — aL p(y, — Р 
= gij ij = t)zt gij (ij = 1)(—517 ) 


1 1 1 1 i 1 
= atte "1= aes eiu] (vi) 


evaluated at uj; = t for 0 < t < оо. Hence we have the following result: 


Theorem 4.2.6. Consider the density fp (X) in (4.2.1) and the transformation Y = 
A?X B3, Letting Y — (yij), the yij's are mutually independently distributed as in (iv) 
above and each у} is distributed as a real chi-square random variable having one degree 


of freedom or equivalently a real gamma with parameters a = 1 and B = 2 where the 
usual real scalar gamma density is given by 


1 Z 
©) = Bre ° B, (vii) 


for 0 <z < oo, (о) > 0, (8) > Oand f(z) = 0 elsewhere. 


As a consequence of the Ur 's being independently gamma distributed, 5 5 | T is real 
gamma distributed with the parameters œ = 4 and В = 2. Then tr(YY’) is real gamma 
distributed with the parameters œ = v and B — 2 and each diagonal element in Y Y" 
is real gamma distributed with parameters 4 and B = 2 or a real chi-square variable 
with q degrees of freedom and an expected value 24 = q. This is an alternative way 
of proving (4.2.12). Proofs for the other results in (4.2.12) and (4.2a.10) are parallel and 


hence are omitted. 
4.2.3. Some special cases 


Consider the real p x q matrix-variate Gaussian case where the exponent in the density 
is —5tr(AX BX’). On making the transformation AIX = Z > dZ = |A|2dX, Z has a 
p х q matrix-variate Gaussian density of the form 


р 
|B|? 1 ; 
fp. UD) = QE TEAREN, (4.2.14) 
л)? 


If the distribution has а p x g constant matrix М as location parameter, then replace Z by 
Z — M in (4.2.14), which does not affect the normalizing constant. Letting Z1, Z2,..., Zp 
denote the rows of Z, we observe that Z; has a q-variate multinormal distribution with the 
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null vector as its mean value and BT! as its covariance matrix for each и 
This can be seen from the considerations that follow. Let us consider the transformation 
Y = ZB? = dZ = |В| 2dY. The density in (4.2.14) then reduces to the following, 
denoted by fp, (Y): 

fpa (Y) = ! Pq gir) (4.2.15) 

(2л) 2 

This means that each element уг; in Y = (yij) is areal univariate standard normal variable, 
уу ^ №00, 1) as per the usual notation, and all the y;;'s are mutually independently 
distributed. Letting the p rows of Y be Y;,..., Yp, then each Y; is a q-variate standard 
normal vector for j = 1,..., р. Letting the density of Y; be denoted by fy, (Y;), we have 


1 1 у! 
hiuc e m 
(2x)? 
Now, consider the transformation Z; — Y;B-3 = dY; = |B|?dZ; and Y; — 7}В?, That 
is, ҮҮ; = ZjBZ; and the density of Z; denoted by f7,(Z;) is as follows: 


|B|? 
® 4 
(27)? 
which is a q-variate real multinormal density with the covariance matrix of Z; given by 
В-!, for each j =1,..., p, and ће Z;’s, j = 1,..., p, are mutually independently 
distributed. Thus, the following result: 
Theorem 4.2.7. Let 21,..., Zp be the p rows of the p x q matrix 2 in (4.2.14). Then 
each Z; has a q-variate real multinormal distribution with the covariance matrix В -l for 
Ј = 1,...,р, апа Zi,..., Zp are mutually independently distributed. 


e 20/82) Bs 0, (4.2.16) 


Observe that the exponent in the original real p x q matrix-variate Gaussian density 
can also be rewritten in the following format: 


1 1 1 
—;ч(АХВХ'!) = —5tr(X'AXB) = —5tr(BX’AX) 
1 1 
= —5tr(U'AU) = —5tr(ZBZ'), A?X — Z, XB? =U. 


Now, on making the transformation U — XB 2 = dX = |B|-?dU , the density of U, 
denoted by fp, (U), is given by 
AIÉ 1 
fog) = AP. е0 avy (4.2.17) 
(27) 2 
Proceeding as in the derivation of Theorem 4.2.7, we have the following result: 
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Theorem 4.2.8. Consider the p x д real matrix U in (4.2.17). Let Ui, ..., Од be the 
columns of О. Then, Ui, ..., Ug are mutually independently distributed with U; having a 
p-variate multinormal density, denoted by јџ (О), given as 


1 
Al2 Й : 
Хи, (Оу) = m me (4.2.18) 
T)2 


The corresponding results in the p x q complex Gaussian case are the following: 


Du 4.2a.6. Consider the p x q complex Gaussian matrix X. Let A2X = Zand 
Zi, ..., Zp be the rows of Z. Then, Zj,..., vA are mutually independently distributed 
with 7) т тн a q-variate complex multinormal density, denoted by Ўз, (Z; ), given by 


_ |det(B)] 600282). 
л 


fa, (7) (4.2а.12) 


Theorem 4.2a.7. Letthe p x q matrix X have a complex matrix-variate distribution. Let 
U = XB? and Üi, m б. be the columns of U. Then Uj, TP U; are mutually indepen- 
dently distributed as p-variate complex multinormal with covariance matrix A^! each, 
the density of 0}, denoted by fo, (U;), being given as 


|det(A)| (0340). 


(4.2а.13) 
лр 


fo, (0; ) = 
Exercises 4.2 


4.2.1. Prove the second result in equation (4.2.12) and prove both results in (4.24.10). 


4.2.2. Obtain (4.2.12) by establishing first the distribution of the row sum of squares and 
column sum of squares in Y, and then taking the expected values in those variables. 


4.2.3. Prove (4.2a.10) by establishing first the distributions of row and column sum of 
squares of the absolute values in Y and then taking the expected values. 


4.2.4. Establish 4.2.12 and 4.2a.10 a using the general polar coordinate transformations. 


4.2.5. First prove that >“ — [ijl "E is a 2q-variate real gamma random variable. Then 


establish the results in (4.2a.10) by using the those on real gamma variables, where Y — 
(71у), the yij's in (4.2a.11) being in the complex domain and |y;;| denoting the absolute 
value or modulus of yjj. 
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4.2.6. Let the real matrix A > О be 2 x 2 with its first row being (1, 1) and let B > О 
be 3 x 3 with its first row being (1, 1, —1). Then complete the other rows in A and B so 
that A > O, B > O. Obtain the corresponding 2 x 3 real matrix-variate Gaussian density 
when (1): М = О, (2): M + O with a matrix M of your own choice. 

4.2.7. Let the complex matrix A > O be 2 x 2 with its first row being (1, 1 4- i) and 
let В > O be3 x 3 with its first row being (1, 1 + i, —i). Complete the other rows with 
numbers in the complex domain of your own choice so that А = A* > О, В = B* > О. 
Obtain the corresponding 2 x 3 complex matrix-variate Gaussian density with (1): M — O, 
(2): M z O with a matrix M of your own choice. 

4.2.8. Evaluate the covariance matrix in (4.2.16), which is E (27 Z j), and show that it is 
B. 

4.2.9. Evaluate the covariance matrix in (4.2.18), which is E(U;U 3), and show that it is 
AT, 

4.2.10. Repeat Exercises 4.2.8 and 4.2.9 for the complex case in (4.2a.12) and (4.2a.13). 
4.3. Moment Generating Function and Characteristic Function, Real Case 


Let T = (tjj) be a p x д parameter matrix. The matrix random variable X = (x;;) is 
p x q and it is assumed that all of its elements x;;'s are real and distinct scalar variables. 
Then 


p dq 
а) У y apr XT) HT): (i) 
і=1 ј=1 
Note that each ¢;; and x;; appear once in (i) and thus, we can define the moment generating 
function (mgf) in the real matrix-variate case, denoted by Му (Т) or Mx (Т), as follows: 


M (T) = E[e 7] = l ет) f, „(Х)ах = Mx(T) (ii) 
X 


whenever the integral is convergent, where E denotes the expected value. Thus, for the 
p х q matrix-variate real Gaussian density, 


4 Р 
Mx(T) = му(т) = SEE / ero jui xaX A Dy 
(2л) Jx 


where A is p x p, B isq x q and A and B are constant real positive definite matrices so 
that А? and B? are uniquely defined. Consider the transformation Y — АХВ? => dY = 
|A||B|dX by Theorem 1.6.4. Thus, X = A~2Y B^? and 


tr(T X^) = t(TB-3Y'A73) = (A 2T B-2Y’) = (TaY ’) 
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where 7(1) = АТВ. Then 
"E ЕИ 
Y 


Note that Ta) Y' and YY’ are p x p. Consider —2tr(T(1) Y^) 4-tr(Y Y^), which can be written 
as 


—2tr(T Y) + tr(YY’) = —tr(Ta) Thy) + tr[(Y — Tap) Y E Тау]. 


Therefore 


Mx(T) = оао I / g 20 Tay) gy 
(2л) 2 JY 
/ -1 ERIS e = 
— е? (ТТ) — gxt(A 2TB7IT'A72) _ 5A TBCI T) (4.3.1) 


since the integral is 1 from the total integral of a matrix-variate Gaussian density. 


In the presence of a location parameter matrix M, the matrix-variate Gaussian density 
is given by 


а р 
Fo = |A|? Lis en 10А? (X—M) BOC- My A3) (4.3.2) 
2л)? 


where M is a constant p x q matrix. In this case, TX’ = Т(Х– М+М) = Т(Х-М)'+ 
T M’, and 


Mx(T) = М (Т) = E[e 1 X5] = e"(TM’ pper (c7 M))] 
= elt(TM’) uA ITB IT) Е ett(TM) +S AT BIT) (4.3.3) 


When р = 1, we have the usual g-variate multinormal density. In this case, A is 1 x 1 and 
taken to be 1. Then the mgf is given by 


My(T) = ef M*3T87T' (4.3.4) 


234 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


where Т, M and X are 1 x q and B > О is q x q. The corresponding characteristic 
function when p — 1 is given by 


Q(T) = eT M'-3TB" T. (4.3.5) 


Example 4.3.1. Let X havea 2 x 3 real matrix-variate Gaussian density with the follow- 
ing parameters: 


| |xn Xi x13 _| 1 0—1 _ |11 
= is X22 * "ELT © E 1 0 |. de [ | | 
— 1 
B= 2 | 1 
1 1 3 


Consider the density f2,3(X) with the exponent preceded by 1 to be consistent with p- 
variate real Gaussian density. Verify whether A and B are positive definite. Then compute 
the moment generating function (mgf) of X or that associated with f» 3(X) and write down 
the exponent explicitly. 


Solution 4.3.1. Consider a 2 x 3 parameter matrix Т = (tjj). Let us compute the various 
quantities in the mgf. First, 


] x 
M' = Hi fi йз 0 1 eodcm + 2 
Q1 02 D3]| 4 0 121 — бз —t1 + 12 
so that 
tr(T M^) = t1 — йз — t1 + f. (i) 
Consider the leading minors in A and В. Note that |(1)| = 1 > 0, |A| = 1 > 0, |(3)| = 
3>0, 2 E | = 5 > 0, |B| = 8 > 0; thus both A and B are positive definite. The 


inverses of A and B are obtained by making use of the formula C^! = (Со (С )); they 


are 
T ae 
Е 1 
аы АК: 4 8 -4 
Е ЕЕЕ 
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For determining the exponent in the mgf, we need A^! T and B^! T', which are 
AT — 2: —1 ||т f Їз 
=l 1 | [tai 22 Юз 


_|21— Юю 2g bes 2h = b3 
—hith —й2 +02 —ti3 +3 


—3 ty б] 
—4 112 t2 
5 їз бз 


* + " — 3113 501 + 405 — 3t53 
4t11 + 8112 — 443 4t51 + 8125 — 4to3 
—3 — 412 + 513 —3h1 — 4hb2 + 503 


Hence, 


1 1 
УША IT BIT) = 161011 — 21) 5 + 4n» — 33) 


+ (2112 — t22)(4t11 + 8t12 — 4113) + (2113 — 03) (—31t11 — 4112 + 5113) 
+ (—tji + 201) (50у + 402 — 3153) + (—t12 + t22) (4t21 + 802 — 403) 
+ (—ti3 + 023) (—3t21 — 402 + 5123)]. (ii) 


Thus, the mgf is Mx (T) = e2 where 
1 
Q(T) = tr(T M^) + 50А TB T), 


these quantities being given in (i) and (ii). This completes the computations. 


4.34. Moment Generating and Characteristic Functions, Complex Case 


Let X = (i; j) be a p x q matrix where the x;;'s are distinct scalar complex variables. 
We may write Х = Xı +iX2, i = /(—1), X1, X» being real p x q matrices. Let T 
be a p x q parameter matrix and T= Т + ib, Ti, Т _ being real р х g matrices. The 
conjugate transposes of X and Т are denoted by X* and Т“, respectively. Then, 


tr(T X*) = Т Т) (ХХ, —iX2)] 
= tr[Ti X4 + T-X, + i(T5X, — T1X5)] 
= (T1 X1) + tr(T2X2) + i (2X1 — T4 X2). 
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2 2 D. 
If Ti = Gi) 2 MN QU. X3 ={ у, Т» = Ge у, (MX) = У? L mons 
tr(T5 X5 )= = 1 оа. In other words, tr(Ti X}) + tr(72 X; ) gives all the x;;’s 


in the real and ae parts of x multiplied by the corresponding ¢;;’s in the real and 
complex parts of Т. That is, Ee" Х)+и(1›Х))] gives a moment generating function (mgf) 
associated with the complex matrix-variate Gaussian density that is consistent with real 
multivariate mgf. However, [tr(Tj X, DES tr(T5 X ) = ST X *p, R(-) denoting the real 
part of (-). Thus, in the complex case, the mgf for any real-valued scalar function g(X) of 
the complex matrix argument X , where g(X ) is a density, is defined as 


Mg(T) = | еШ TX a (4.3a.1) 
X 


whenever the expected value exists. On replacing T Бу iT, i = JC -1), we obtain the 
characteristic function of X or that associated with f, od by ф #(Т) = eu 7), That 
is, 


ф#(@) = | eli T X»bocnax. (4.3a.2) 


Then, the mgf of the matrix-variate Gaussian density in the complex domain is available 
by paralleling the derivation in the real case and making use of Lemma 3.2a.1: 


M;(T) E E [er X91] 
- qr CP) Е (А27 8717 A73] (4.34.3) 

The corresponding characteristic function is given by 

és (T) = БЭК) (А 278717 А29] (4.34.4) 
Note that when A = A* > O and B = B* > O (Hermitian positive definite), 

ATB FA g e ATR TA, 
that is, this matrix is Hermitian. Thus, letting U = A-2T Bo lpr Am? = = Ui + iU2 
where {Л and U2 are real matrices, (Л = U; and Uz = =U, that is, Л and U2 
are respectively symmetric and skew symmetric real matrices. Accordingly, (0) = 
tr(U1) + itr(U2) = tr(U1) as the trace of a real skew symmetric matrix is zero. Therefore, 


(АТ B-!T*A-2)] = tr(A-2T B-!T*A-2), the diagonal elements of a Hermitian 
matrix being real. 
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When р = 1, we have the usual q-variate complex multivariate normal density and 
taking the 1 x 1 matrix A to be 1, the mgf is as follows: 


Mg(T) = eO Mod B7 T» (4.34.5) 


where T, M are 1 x q vectors and B = B* > O (Hermitian positive definite), the 
corresponding characteristic function being given by 


pg (f) = ROTM id 1», (4.3a.6) 


Example 4.3a.1. Consider a 2 x 2 matrix X in the complex domain having a complex 
matrix-variate density with the following parameters: 


Determine whether A and B are Hermitian positive definite; then, obtain the mgf of this 
distribution and provide the exponential part explicitly. 


Solution 4.3a.1. Clearly, A > O and B > O. We first determine А-1, B-l, А-ІТ, 
BOT": 
At = 1{3 —i ATT = 1|3 —i 1m f _ 1 3t = ір E) ы ity) 
5|i ; i 2 hi бә 5 | iti +20] ір + 2h} 
Bo! - 1 i p 2 f + ith D + ity, | 
E Us itt, 2t,  —ity + 2th, 
ô = jtr(A-! T B-1 T*), 


108 = {Git — ib) (у + iff) + Giz — їБә)(—ЇЙ + 27%) 
+ (itu + 21) (бу + ito) + (ith + 2052) (у + 255), 


106 = Btt + Bittin — ifj] + hitio 
+ бїйәй» — tooth) — 21021 — Zift 
+ iff = 15 + 215155 + 210105 
+ tats, + 21020 — 21020 + 4020}, 
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106 = Зі + боѓ + 2216 + 4026 

+ З — ой — [tatty + tity] 
+ iti, — 011+ [254 + th] 
+2 [2105 — зә] + 2i[f12/55 — 022]. 

Letting f, = tes) А itrs2, i = A/(—1), бу, trs2 being real, for all r and s, then ô, the 

exponent in the mgf, can be expressed as follows: 

1 
ô = i98 tu + typ) + 600 + tiz) + 2241 + 1213) +4095 + 1253) 


— 6(tiiotio1 — 11111122) — 2(@1111221 — 11121022) — 2(tii2t211 — 11117212) 
+ 2(ti21t211 + 1122212) — 4 (0212221 — 1211222) — 4(11221221 — t121£222)]- 


This completes the computations. 


4.3.1. Distribution of the exponent, real case 


Let us determine the distribution of the exponent in the p x q real matrix-variate Gaus- 
sian density. Letting u = tr(AX B X^), its density can be obtained by evaluating its associ- 
ated теѓ. Then, taking f as its scalar parameter since и is scalar, we have 


M(t) = Efe] = Efe! t(AX8X)), 


Since this expected value depends on X, we can integrate out over the density of X: 


M (t) TS el ef (AX BX’)— StAX BX) a y 
i = 
X 


EC J grat APE) aX for 1— 2t > 0 (i) 
x 
where 
4 р 
c _!АГ!В? 
(Ол) 


The integral in (i) is convergent only when 1 —2t > 0. Then distributing /(1 — 27) to each 
element in X and X’, and denoting the new matrix by X;, we have X; = V(I — 21) X = 
dX, = (/C — 2f))?4dX = (1 — 21) 2 dX. Integral over X;, together with C, yields 1 and 
hence 

M,(t) = (1 —2t)~?, provided 1 — 2t > 0. (4.3.6) 
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The corresponding density is a real chi-square having pq degrees of freedom or a real 
gamma density with parameters œ = 21 and 6 = 2. Thus, the resulting density, denoted 


by fu, (ит), is given by 


D рӯ . u 
абш) = (22 r(pq/2] lu? eT}, 0<u «oo, р,4 = 1,2,..., (43.7) 


and f,,(u1) = 0 elsewhere. 
4.3a.1. Distribution of the exponent, complex case 


be ыы bo e e AT T Ж Ep 
In the complex case, letting п = tr(A2 X BX* A2), we note that 4 = u* and и is a 
scalar, so that й is real. Hence, the mgf of й, with real parameter t, is given by 


Tis p 1 ru > 1 
M;(t) = Efe! (42 XBX"A2)) уе 1—t>0, with 
X 


Idet(A)|" |det(B)|" 


С = 
mPa 


On making the transformation Y= А?Х В D , we have 
1 —(1—-0tr(YY*) 40 
Мұ) = — | е ау. 
л Y 


However, 
Ке р q 
uro vp -Y:Y 0 4359 
r=1 s=1 r=1 s=1 
where Yrs = Yrs1 + iyps2, i = V(—1), yrs, yrs2 being real. Hence 


| [^ [* .a-sG2,2329 1 
Ji J e rst rs Dri h Ее. Lt > 0. 


Therefore, 
M;(t)—-(1—t) "74, 1-t>0, (4.3a.7) 


and п = v has a real gamma density with parameters a = pq, В = 1, or a chi-square 
density in the complex domain with p q degrees of freedom, that is, 


рР4—!е—°, ( « y < oo, (4.3a.8) 


Фф) = Ta 


and f,(v) = 0 elsewhere. 
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4.3.2. Linear functions in the real case 


Let the p x q real matrix X = (х;;) of the real scalar random variables x;;'s have the 
density in (4.2.2), namely 


а Р 
fp, a ОХ = к. 5tr(A(X M)B(X—M)) (4.3.8) 


Pa 
2 


for A > О, В > О, where M is a p x д location parameter matrix. Let Lj be a p x 1 
vector of constants. Consider the linear function 4] = LX where 41 is 1 x q. Let T bea 
1 х q parameter vector. Then ће mgf of the 1 х q vector 41 is 


Mz,(T) = E[e7 72] = Ep МО] = ge TX EN] 


This can be evaluated by replacing T by |Т in (4.3.4). Then 


Mz, (T) = e (Ga T)M^)riw(A-!L,TB- (Lı Ty’) 


Е 4 1 д1 -ly " 
— gu T M'Li)y 5s A TL)TB IT] (ii) 


Since ГОА Та is а scalar, 
(L\A LTB T'-TQAA L)B T. 


On comparing the resulting expression with the mgf of a q-variate real normal distribution, 
we observe that 21 is a q-variate real Gaussian vector with mean value vector L' М and 
covariance matrix [Li А11, 1]1B 7! . Hence the following result: 


Theorem 4.3.1. Let the real p x q matrix X have the density specified in (4.3.8) and L| 
be a p x 1 constant vector. Let 21 be the linear function of X, Zy = L' X. Then Zi, which 
is 1 x q, has the теў given in (ii) and thereby Z, has a q-variate real Gaussian density 
with the mean value vector LM and covariance matrix [L,A 1 L1]B f. 


Theorem 4.3.2. Let Lə be aq x 1 constant vector. Consider the linear function Zo = 
X La where the p x q real matrix X has the density specified in (4.3.8). Then Z2, which is 
p х l, is a p-variate real Gaussian vector with mean value vector M Lz and covariance 
matrix [L; B 1 L?]A |. 


The proof of Theorem 4.3.2 is parallel to the derivation of that of Theorem 4.3.1. 
Theorems 4.3.1 and 4.3.2 establish that when the p x q matrix X has a p x q-variate real 
Gaussian density with parameters M, A > O, B > O, then all linear functions of the 
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form LX where Lı is p x 1 are q-variate real Gaussian and all linear functions of the 
type X L2 where [2 is q х 1 are p-variate real Gaussian, the parameters in these Gaussian 
densities being given in Theorems 4.3.1 and 4.3.2. 


By retracing the steps, we can obtain characterizations of the density of the p x q real 
matrix X through linear transformations. Consider all possible p x 1 constant vectors Lı 
or, equivalently, let Lı be arbitrary. Let Т bea 1 х q parameter vector. Then the p x q 
matrix LıT, denoted by Ту), contains pq free parameters. In this case the mgf in (ii) can 
be written as 

M(Ia) = eio МЭА T BT) (iii) 


which has the same structure of the mgf of a p x q real matrix-variate Gaussian density 
as given in (4.3.8), whose the mean value matrix is M and parameter matrices are A > O 
and B > O. Hence, the following result can be obtained: 


Theorem 4.3.3. Let Lı be a constant р x 1 vector, X be a p x q matrix whose elements 
are real scalar variables and A > O be p x p and B > O be q x q constant real positive 
definite matrices. If for an arbitrary vector Іл, Іл X is a q-variate real Gaussian vector 
as specified in Theorem 4.3.1, then X has a p x q real matrix-variate Gaussian density as 
given in (4.3.8). 


As well, a result parallel to this one follows from Theorem 4.3.2: 


Theorem 4.3.4. Let L2 be ag х 1 constant vector, X bea p х q matrix whose elements 
are real scalar variables and A > О be p x p and B > О be q x q constant real positive 
definite matrices. If for an arbitrary constant vector Lz, XL» is a p-variate real Gaus- 
sian vector as specified in Theorem 4.3.2, then X is p x q real matrix-variate Gaussian 
distributed as in (4.3.8). 


Example 4.3.2. Consider a 2 х 2 matrix-variate real Gaussian density with the parame- 


ters 
_ |2 d _ |2 1 = a Sh 6 Ц x12 
Letting U1 = ІХ, U2 = XL», U3 = L| XLo, evaluate the densities of Uj, U2, U3 


by applying Theorems 4.3.1 and 4.3.2 where Li = [1,1], L5 = [1, —1]; as well, obtain 
those densities without resorting to these theorems. 


Solution 4.3.2. Let us first compute the following quantities: 


A, Bl, А7111, LB! L», ІМ, МГ, ІМІ. 
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B 
LM =[1,1] [ Ed = [1,0], МІ, = ЕІ! 
1 
2 


/ — 1 m 1 / = 1 2 —1 1 


L, ML = [1,0] Е = 2. 


Let Ор = ІХ, 105 = XL2,U3 = L'XL». Then by making use of Theorems 4.3.1 
and 4.3.2 and then, results from Chap. 2 on q-variate real Gaussian vectors, we have the 
following: 


Ui ~ N2((1,0), (0B, U2 ~ No(ML2,2A7'), Us ~ Ni, (1)(2)) = №1, 2). 


Let us evaluate the densities without resorting to these theorems. Note that U; = [x11 + 
X21, X12 + x22]. Then U; has a bivariate real distribution. Let us compute the теѓ of Uj. 
Letting tı and f» be real parameters, the mgf of U1 is 


My, (ti, б) = E [e Guta) +5012+х22)] = Е[ейхи+1х021+0х0+5х2] 
1 , , 


which is available from the mgf of X by letting t1 = 4, б = А, t2 = t2, tn = to. 


Thus, 

AT = 1 —1 t b _ 0 0 

| —1 2 t b| lti b 
BT 1 2 —1 ty ty _1 211—1 21 — 0 
3|-1 2 |)» b| 3|-t-25 —&-25]' 

so that 

1 —-lm р-1%/ ol 1 2 _ il -1|n . 

j'(A" TB Т) = 515100 *28 - 210) ln. 218 M (i) 
Since 


X11 — X 
D = XL = 11 12 ; 
X2] — X22 
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we let tj; = fi, ёо = —tj, бү = Б, бә = —t. With these substitutions, we have the 
following: 
AM = 2 —1 tj —1 = t1 — t5 —ti + t5 
—] 2 h —ft5 —П1 +26 t1 —2h 
1 2 —1 ti h |А Ip) 
3| —1 2 —t) —t5| |-t —ю|` 


(ATIT BIT’ = h(t) — b) —tj(—t + b) + Ю(—ї + 26) — b(t — 20) 


NI E E | B | 


Therefore, U> is а 2-variate real Gaussian with covariance matrix 2A-! and mean value 


Hence, 


vector E . That is, Uy ~ No5(M L2, 2A-). For determining the distribution of U3, 
—1 

observe that L} XL? = LU». Then, LU» is univariate real Gaussian with mean value 

ЕП И»] = 1ЛМ1„ = [1, 1] Е 


That is, U3 = из ~ №(1, 2). This completes ће solution. 


= 1 and variance LiCov(U2)L; = L12A | L, = 2. 


The results stated in Theorems 4.3.1 and 4.3.2 are now generalized by taking sets of 
linear functions of X: 


Theorem 4.3.5. Let С bear x p, г < p, real constant matrix of full rank r and G 
be a q x s matrix, s < q, of rank s. Let Z = C'X апа W = XG where X has the 
density specified in (4.3.8). Then, Z has a r x q real matrix-variate Gaussian density 
with M replaced by C' M and A^! replaced by C' A-! C, B7! remaining unchanged, and 
W = ХС has a p x s real matrix-variate Gaussian distribution with B7! replaced by 
G'B-!G and M replaced by MG, A~! remaining unchanged. 


Example 4.3.3. Let the 2 x 2 real X = (xij) have a real matrix-variate Gaussian dis- 
tribution with the parameters M, A and B. Consider the set of linear functions U — C'X 
where 


== 
eedem 
М2. 
Show that the rows of U are independently distributed real g-variate Gaussian vectors with 
common covariance matrix В! and the rows of M as the mean value vectors. 
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Solution 4.3.3. Let us compute A^! and C'A^!C: 
] 1 
zd 
ДЫ 


D ack. 
С'АТ!С = м2 |11 1 ie | ls ш. 


In the mgf of = С'Х, А! is replaced by С'А-!С = h and B^! remains ће 
same. Then, the exponent in the mgf of U, excluding tr(T M^) is str(T B-1T' ) = 
сщ TjB-'T; where T; is the j-th row of T. Hence the p rows of U are indepen- 
dently distributed q-variate real Gaussian with the common covariance matrix B~!. This 
completes the computations. 


The previous example entails a general result that now is stated as a corollary. 


Corollary 4.3.1. Let X bea pxq-variate real Gaussian matrix with the usual parameters 
M, A and B, whose density is as given in (4.3.8). Consider the set of linear functions 
U = C'X where C isa p x p constant matrix of full rank p and C is such that A = CC’. 
Then C'A“'C = C'(CC) !c = C'(C)!c^!c = Ip. Consequently, the rows of О, 
denoted by Uj, ..., Up, are independently distributed as real q-variate Gaussian vectors 
having the common covariance matrix B^. 


It is easy to construct such а C. Since А = (aij) is real positive definite, set it as 
A — CC' where C is a lower triangular matrix with positive diagonal elements. The first 
row, first column element in C = (cij) is c11 = -- aj. Note that since A > О, all the 
diagonal elements are real positive. The first column of C is readily available from the first 
column of A and c11. Now, given a»» and the first column in C, c22 can be determined, and 
so on. 


Theorem 4.3.6. Let C, G and X be as defined in Theorem 4.3.5. Consider the r x s 
real matrix Z = C'XG. Then, when X has the distribution specified in (4.3.8), Z has an 
г x s real matrix-variate Gaussian density with M replaced by C'MG, A^! replaced by 
C'A-!C and B™ replaced by G' B-1G. 


Example 4.3.4. Let the 2 x 2 matrix X = (x;;) have a real matrix-variate Gaussian 
density with the parameters M, A and B, and consider the set of linear functions Z — 
C'XG where C' is a p x p constant matrix of rank p and G isaq x q constant matrix of 
rank q, where 
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2 —1 2 — 2 1 
м= 15 |А a pelia) 
dE 2 0 
М2 42 V2 


Show that all the elements z;;'s in Z = (zij) are mutually independently distributed real 
scalar standard Gaussian random variables when M = O. 


Solution 4.3.4. We have already shown in Example 4.3.3 that C/A~'C = I. Let us 
verify that GG’ = В and compute G’ B~!G: 


42. 0]1|42 4 
e I ET 1-7-5 
Z ү? 


1|9/2- 43 0 v2 0 1[5 0 
51 у vil jol j- 


V2 2 
Thus, А! is replaced by C’A~!C = h and B^! is replaced by С’'В-!С = h in the mgf 
of Z, so that the exponent in the mgf, excluding tr(T М”), is str(T Т”). It follows that all 
the elements in Z = C'XG are mutually independently distributed real scalar standard 
normal variables. This completes the computations. 


The previous example also suggests the following results which are stated as corollar- 
les: 


Corollary 4.3.2. Let the p x q real matrix X = (xij) have a real matrix-variate Gaus- 
sian density with parameters M, A and B, as given in (4.3.8). Consider the set of lin- 
ear functions Y = XG where G is a q x q constant matrix of full rank q, and let 
В = GG’. Then, the columns of Y, denoted by Y(1),..., Үс), are independently dis- 
tributed p-variate real Gaussian vectors with common covariance matrix A^! and mean 
value (MG) (у), j = l,...,q, where (МС) (р) is the j-th column of MG. 
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Corollary 4.3.3. Let 7 = C'XG where C is a p x p constant matrix of rank p, G is a 
q X q constant matrix of rank q and X is a real p x q Gaussian matrix whose parameters 
are M = О, A and B, the constant matrices C and G being such that A = CC’ and 
B = GG’. Then, all the elements zij in Z = (zij) are mutually independently distributed 
real scalar standard Gaussian random variables. 


4.3a.2. Linear functions in the complex case 


We can similarly obtain results parallel to Theorems 4.3.1—4.3.6 in the complex case. 
Let X be p x q matrix in the complex domain, whose elements are scalar complex vari- 
ables. Assume that X has a complex p x q matrix-variate density as specified in (4.2a.9) 
whose associated moment generating function is as given in (4.3a.3). Let Cj be a p x 1 
constant vector, C» be a д x 1 constant vector, C be ar x р, ғ < p, constant matrix of 
rank r and G be a q х 5, 5 <q, aconstant matrix of rank s. Then, we have the following 
results: 


Theorem 4.3a.1. Let Ci bea p x 1 constant vector as defined above and let the p х q 
matrix X have the density given in (4.2a.9) whose associated mgf is as specified іп (4.3a.3). 
Let U be the linear function of X, U = C с Then U has а q-variate complex Gaussian 
density with the mean value vector C*M and covariance matrix [C1A^1 C1] Bl. 


Theorem 4.3a.2. Let C» bea q х 1 constant vector. Consider the linear function Y = 
XC» where the p x q complex matrix X has the density (4.2a.9). Then Y isa p-variate 
complex Gaussian random vector with the mean value vector M C» and the covariance 
matrix [С B7! Č2]A™!. 


Note 4.3a.1. Consider the mgf’s of U and Y in Theorems 4.3a.1 and 4.3a.2, namely 
Mg (T) = E[e 1 05] and M;(T) = E[e 0*7] with the conjugate transpose of the 
variable part in the linear form in the exponent; then T in Mj; (T) has to be 1 x q and 
T in Mg(T) has to be p x 1. Thus, the exponent in Theorem 4.3a.1 will be of the 
form [C tactic ИТ B-! T* whereas the corresponding exponent in Theorem 4.3a.2 will 
be [C28-1C;]T* A^! Т. Note that in one case, we have T B~!T* and in the other case, Т 
and T* are interchanged as are A and B. This has to be kept in mind when applying these 
theorems. 


Example 4.3a.2. Consider a 2 x 2 matrix X having a complex 2 x 2 matrix-variate 
Gaussian density with the parameters M = О, A and B, as well as the 2 x 1 vectors Ly 
and L2 and the linear functions ІХ and X L5 where 
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MEC |1 i _ | 28 EE 2 | Xn Xi 
e ep аре [раа 2 
Evaluate the densities of U = L*X and Y = XL» by applying Theorems 4.3a.1 and 4.3a.2, 
as well as independently. 


Solution 4.3a.2. First, we compute the following quantities: 


DE wy X 
pues aep t? 


Lý = [2i, —3i], L3 = [i, 2i], 


so that 
1 i ||-—2i 
* A—1 EN GN о — 
ЦА 1 = [2i, —3i] E J | 3i | = 22, 
І*В- = [i, 2i] 2 —i —i e 
p^ y meque a Е 


Then, as per Theorems 4.3a.1 and 4.3a.2, О isa q-variate complex Gaussian vector whose 
covariance matrix is 22 B^! and Y is a p-variate complex Gaussian vector whose co- 
variance matrix is 6 A7, that is, Ü ~ N2(O,22 B^), Y ~ N3(0,6 А-1). Now, let us 
determine the densities of С and Y without resorting to these theorems. Consider the mgf 
of U by taking the parameter vector T as T = [А,Б]. Note that 


TU* = n(—2iX], 4-3iX3)) + to(—2i*, + 315%). (i) 


Then, in comparison with the corresponding part in the mgf of X whose associated general 
parameter matrix is Т = (f; j), ме have 


fii = —2ifi, йо = —2ift», f = 31б, f22 = 3if». (її) 

We now substitute the values of (ii) in the general mgf of X to obtain the mgf of U. Thus, 
A 1 i|[-2ià -—2i| [(-3-2)h (-3-205 
|" .|-i 2 Зі 315 | | (—24+ 61) (2+ 61)h 


pigs [2 zi | [2107 -3um]| [4-25 -esm-35 
i 1 ]||2; -—»5 —2tf+2it; Зір —– 3115 |' 
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Here an asterisk only denotes the conjugate as the quantities are scalar. 


t[A ! T B! T*] = [-3 – 2i] [Aiff + 25] + [(—3 — 2061-27 + 216] 
+ [(—2 + 61) [\(—6it* — 385] + [(—2 + 61) [37 — 3105] 
= 22 [251] — T + іі + ЪЁ] 
"NL HT 
= 22]|tj, t2] | . T =22ТВ Т, Т = |ң, tl. 
і oldie 
This shows that U ~ N>(O, 22B- !). Now, consider 
Padre E | | E 2 E - se | 
Хоу X22|| —2i —iX2] — 2i X22 

Then, on comparing the mgf of Y with that of X whose general parameter matrix is T= 


(ti), we have 
f = ity: 12 == 2iñ, 01 == ih, 02 = 2if». 


On substituting these values in the mgf of X , we have 
Аа Г Та 2zà] [i-a 25-25 
“i—i 2]|lif 2ib| |ñ +2ib 2f, +4 
Bs 2 -i -itt -it _ (-2— 21) (—2— 2i) 
i 1 || dite —2itš (l—-20)m П—2й |’ 


so that 


u[A | TB | T*] = [A — Б)][(—2 — 2077] + Dif — 25] — 21) 77] 
+ [f +215102 — 2082] + [24 + 4i% I — 208] 
= 6 [Ай = itt + ibti + 25505] 


= 6[i, if] E | H —6T*A- f; 


refer to Note 4.3a.1 regarding the representation of the quadratic forms in the two cases 
above. This shows that Y ^ N5(O, 6A-!). This completes the computations. 


Theorem 4.3a.3. Let С bea constant p x 1 vector, X bea p x q matrix whose elements 
are complex scalar variables and let A = A* > O be p x p and B = B* > O beq xq 
constant Hermitian positive definite matrices, where an asterisk denotes the conjugate 
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transpose. If, for an arbitrary p x 1 constant vector C1, € Pi is a q-variate complex 
Gaussian vector as specified in Theorem 4.3a.1, then X has the p x q complex matrix- 
variate Gaussian density given in (4.2a.9). 


As well, a result parallel to this one follows from Theorem 4.34.2: 


Theorem 4.3a.4. Let C? be aq x 1 constant vector, X bea p x q matrix whose elements 
are complex scalar variables and let A > О be p x p and B > О be q х q Hermitian 
positive definite constant matrices. If, for an arbitrary constant vector C», XC» is a p- 
variate complex Gaussian vector as specified in Theorem 4.3a.2, then X is p x q complex 
matrix-variate Gaussian matrix which is distributed as in (4.2a.9). 


Theorem 4.3a.5. Let C* bear x р, r € p, complex constant matrix of full rank r and 
G bea q X $, 5 < q, constant complex matrix of full rank s. Let U = C*X and W = XG 
where X has the density given in (4.2a.9). Then, О has а т x д complex matrix-variate 
density with M replaced by C* M, A`! replaced by C* A-! € and B^! remaining the same, 
and W has a р х s complex matrix-variate distribution with B7! replaced by G*B-!G, 
M replaced by MG and A`! remaining the same. 


Theorem 4.3a.6. Let C*, С and X be as defined in Theorem 4.3a.5. Consider the r x s 
complex matrix Z — C*XG. Then when X has the distribution specified by (4.2a.9), Z 
has an г x s complex matrix-variate density with M replaced by C* MG, A`! replaced by 
С*А-ІС and B^! replaced by G* B-!G. 


Example 4.3a.3. Consider a 2 x 3 matrix X having a complex matrix-variate Gaussian 
distribution with the parameter matrices M — O, A and B where 


ax JE NN fe. a 3 
INSIEME LEES 
l 0 -i 2 X21 X22 X23 


Consider the linear forms 


С*Х 


ixi] — Хор 1X12 —1X22 1x13 — 1X23 
Хи + 2501 X12 + 2822 X13 + 2X23 
XG- XiupbiXp4-2x)3 i2 ЇХүр—1Х12+ЕЇХ13 

Хоу + iX22 + 203 X22 iğ — iXoa + 1X23 |. 
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(1): Compute the distribution of Z = C* XG; (2): Compute the distribution of Z = C*XG 
if A remains the same and G is equal to 


з 0 0 
S Aol, 
Lo = j| 
and study the properties of this distribution. 


Solution 4.3a.3. Note that A = A* and B = B* and hence both A and B are Hermitian. 
Moreover, |A| = 1 and |B| = 1 and since all the leading minors of A and B are positive, 
A and B are both Hermitian positive definite. Then, the inverses of A and B in terms of 
the cofactors of their elements are 


сы D Su 
atu i| Coray = 2i 6 3i | = BC. 
| =i ED. 


The linear forms provided above in connection with part (1) of this exercise can be respec- 
tively expressed in terms of the following matrices: 


Let us now compute C*A~!C and G* B^!G: 


gates. rcc pem сы nu 
is ex ||, ||; | е 


ч. 
Н 


3 
jp с=т -2 КИЕ» j 10 i 
С*В-!С = 0 1 0 2i 6 3i i 1 =i 
—i i—i —1 3i 2 2 0 i 
3 —2i —2 +i 
= 2i 6 ] — 6i 


—2—i 1+6 7 


Then in (1), C*XG is a2 x 3 complex matrix-variate Gaussian with A^! replaced by 
C* A-!C and B^! replaced by G* B^! G where C*A~!C and G* B^!G are given above. 
For answering (2), let us evaluate G* B^1G: 
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А 0 ‚ J/3 0 0 
1 -2i -1 3 
G*BT!G = 3 21 6 —3i a 2 0 
1 —] 3i 2 =) d 
~ 2 X2 


о © 
oro 
—oo E 


Observe that this q x q matrix G which is such that GG* — B, is go me thus, 
G*B-!G = G*(GG*)!G = G*(G*) 'G"'G = I. Letting Y = XG, X = YG"', and 
the exponent in the density of X becomes 


P 
tr(A7!X BX*) = tr(A7'YG7'B(G*)'¥*) = (УАР) = Y VG) AY) 
j=l 


where the Y, (j) s are the columns of Y, which are independently distributed complex p- 
variate Gaussian vectors with common covariance matrix A~!. This completes the com- 
putations. 


The conclusions obtained in the solution to Example 4.3a.1 suggest the corollaries that 
follow. 


Corollary 4.3a.1. Let the p х q matrix X have a matrix-variate complex Gaussian dis- 
tribution with the parameters M = O, A > O and B > O. Consider the transfor- 
mation U = C*X where C isa p х p nonsingular matrix such that CC* = A so that 
С*А-ІС = I. Then the rows of U, namely Vises Pi are mutually independently dis- 
tributed q-variate complex Gaussian vectors with common covariance matrix ВС. 


Corollary 4.3a.2. Let the p x q matrix X have a matrix-variate complex Gaussian dis- 
tribution with the parameters M = O, A > O and B > O. Consider the transformation 
Y = XG where G isa q xq nonsingular matrix such that GG* = B so that G* B7 IG = I. 
Then the columns of Y, namely Yq), .. € are independently distributed p-variate 
complex Gaussian vectors with common covariance matrix A^ |. 


Corollary 4.3a.3. Let the p x q matrix X have a matrix-variate complex Gaussian dis- 
tribution with the parameters M = O, A > O and B > O. Consider the transformation 
Z — C*XG where C is a p x p nonsingular matrix such that CC* — A and G isa 
q x q nonsingular matrix such that GG* = B. Then, the elements Zi 's of Z = (Zij) are 
mutually independently distributed complex standard Gaussian variables. 
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4.3.3. Partitioning of the parameter matrix 


Tı 
T) 
pi X q and Т is p? x q, so that ру + p2 = p. Let T? = О (a null matrix). Then, 


-1m _ JUPE p- w _ [TBT] О, 
TB per B (1, О)= "EN" 


Suppose that in the p x q real matrix-variate case, we partition T as | | where Тү is 


where ТОВ" is a pı х pı matrix, О is a pı х po null matrix, Оз is a p2 x pı null 
matrix and Оз is a p2 x p» null matrix. Let us similarly partition AT! into sub-matrices: 


AH AP 
—1 
А = K А22 , 


where A!! is py x p; and А22 is рә x ро. Then, 


AUTI BIT) О 


—1 —1т/у __ 
{(А-!ЇТВ ry =u * И 


| = t(A'T,B'T)). 


If A is partitioned as 


Aj Ар 
dc | 
p él 


where Aj; is pı х pı and Ao» is p2 х p», then, as established in Sect. 1.3, we have 
All = (Aii — A45] Az) I. 
Therefore, under this special case of Т, the mgf is given by 


E[e X0] = estr(Au-ApAz A420 BT). (4.3.9) 


which is also the mgf of the p; x q sub-matrix of X. Note that the mgf's in (4.3.9) 
and (4.3.1) share an identical structure. Hence, due to the uniqueness of the mgf, X; has a 
real pı x q matrix-variate Gaussian density wherein the parameter B remains unchanged 
and A is replaced by A1; — A1245 Аз, the A;;’s denoting the sub-matrices of A as de- 
scribed earlier. 
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4.3.4. Distributions of quadratic and bilinear forms 


Consider the real p x q Gaussian matrix U defined in (4.2.17) whose mean value 
matrix is E[X] = M = О and let U = X B?. Then, 


ШАЙ, U; AU» pu ФАО, 
ШАО UJAU» ... ШАШ 
DU Ael. a т (4.3.10) 
U, AU ШАО» nis U, AU, 
where the p x 1 column vectors of U, namely, U1, ..., Uq, are independently distributed as 


Ny(O, AT!) vectors, that is, the U;’s are independently distributed real p-variate Gaussian 
(normal) vectors whose covariance matrix is A^! = E[UU'], with density 


1 
A|2 AU; 
fUn = pe HUY, А> 0. (4.3.11) 
(Ол)? 


What is then the distribution of U (AU j for any particular j and what are the distributions 
OU AU, i # j=1,...,q? Letzj = U; AU; and zi; = U; AU; ,i ж j. Letting t bea 
scalar parameter, consider the mgf of z ;: 


M,, (t) = Ele] = J e U/AUi fy (U dU; 


Uj 


1 

_ ae е781—2003А0, ay, 
(27)? JU; | 

= (1 21)75 {ог1—2г>0, 


which is the mgf of a real gamma random variable with parameters a = D В = 2ога 
real chi-square random variable with p degrees of freedom for p = 1, 2,.... That is, 


U; AU qw X, (a real chi-square random variable having p degrees of freedom). 
(4.3.12) 


In the complex case, observe that U*AU j is real when A = A* > О and hence, the pa- 


: , І ~ : 
rameter in the mgf is real. On making the transformation A? U; = Vj, |det(A)| is canceled. 
Then, the exponent can be expressed in terms of 


р р 
spr 2-0-09»' 152 = 01-0 У (у +у5), 
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where yj = ууу + iyj2, i = /(—1). The integral gives (1 — т) Р for 1 — t > 0. Hence, 
Vj= U; AU j has a real gamma distribution with the parameters a = p, В = 1, that is, a 


chi-square distribution with p degrees of freedom in the complex domain. Thus, 2V; isa 
real chi-square random variable with 2p degrees of freedom, that is, 


2V; 22UT AU; ~ x5. (4.3a.9) 
What is then the distribution of U7AU;, i # j? Let us evaluate the mgf of UAU}; = zij. 


As zij is a function of U; and Uj, we can integrate out over the joint density of О; and U; 
where U; and U; are independently distributed p-variate real Gaussian random variables: 


Mz) = Efe] = J J e! YAU) fy, (Ог) fy (U j)dU; ^ dU; 
70; 


А AU; lU AU; 1U' AU; 
E INE U/AU;—4U AU; HU AU Gt, A аш. 


Let us first integrate out U;. The relevant terms in the exponent are 
ерт Uj) + 1 onutA О) = c da — CYA(U; - C) + “PUTA Unc etu, 
gu ыы 2 p ET Ug VE j Fe en E д. 


But the integral over U; which is the integral over U; — C will result in the following 
representation: 


І 
2 


|A 
(2л)? 
= (1—{2)-# {ог1—{?2> 0. (4.3.13) 


Qoi AU; LU! AU: 
M;,,(t) = Ко 2U AU qp; 
U; 


What is the density corresponding to the mgf (1 — 12)~ 29 This is the mgf of a real scalar 
random variable и of the form и = x — у where x and у are independently distributed 
real scalar chi-square random variables. For p = 2, x and y will be exponential variables 
so that и will have a double exponential distribution or a real Laplace distribution. In the 
general case, the density of и can also be worked out when x and у are independently 
distributed real gamma random variables with different parameters whereas chi-squares 
with equal degrees of freedom constitutes a special case. For the exact distribution of 
covariance structures such as the z;;’s, see Mathai and Sebastian (2022). 
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4.3.1. In the moment generating function (mgf) (4.3.3), partition the p x q parameter 
matrix T into column sub-matrices such that Т = (Tj, T2) where Tj is p x дү and 7 is 
p x qo with qi + q2 = q. Take Т = О (the null matrix). Simplify and show that if X is 
similarly partitioned as X = (Yj, Y2), then Y; has areal p x qı matrix-variate Gaussian 
density. As well, show that Y? has a real p х q2 matrix-variate Gaussian density. 


4.3.2. Referring to Exercises 4.3.1, write down the densities of Ү and Y2. 


4.3.3. If T 1s the parameter matrix in (4.3.3), then what type of partitioning of T is re- 
quired so that the densities of (1): the first row of X, (2): the first column of X can be 
determined, and write down these densities explicitly. 


4.3.4. Repeat Exercises 4.3.1—4.3.3 by taking the mgf in (4.3a.3) for the corresponding 
complex case. 


4.3.5. Write down the mgf explicitly for p = 2 and q = 2 corresponding to (4.3.3) 
and (4.3a.3), assuming general A > O and B > O. 


4.3.6. Partition the mgf in the complex p x q matrix-variate Gaussian case, correspond- 
ing to the partition in Sect. 4.3.1 and write down the complex matrix-variate density cor- 
responding to Tı in the complex case. 


4.3.7. In the real p x q matrix-variate Gaussian case, partition the mgf parameter matrix 
into T — (Ta), To) where Ta) is p x qı and T(2) is рх q2 with q1 + q2 = q. Obtain the 
density corresponding to Tay by letting To) = О. 


4.3.8. Repeat Exercise 4.3.7 for the complex p x g matrix-variate Gaussian case. 


4.3.9. Consider v = U* AU j. Provide the details of the steps for obtaining (4.3a.9). 


4.3.10. Derive the mgf of U* AU j,i Æ j, in the complex p x q matrix-variate Gaussian 
case, corresponding to (4.3.13). 


4.4. Marginal Densities in the Real Matrix-variate Gaussian Case 


On partitioning the real p x q Gaussian matrix into X, of order p; x q and X» of 
order p2 x q so that ру + p2 = p, it was determined by applying the теѓ technique 
that X, has a p, x q matrix-variate Gaussian distribution with the parameter matrices В 
remaining unchanged while A was replaced by A1; — АрА5 Art where the Aj;'s аге the 
sub-matrices of A. This density is then the marginal density of the sub-matrix X, with 
respect to the joint density of X. Let us see whether the same result is available by direct 
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integration of the remaining variables, namely by integrating out X2. We first consider the 
real case. Note that 


tr(AXBX’) = іг |^ (e) B(X; хә] 
2 
EP Х\ВХ\ Х\ВХ,» 
и XoBX, XoBX5]| 
Now, letting A be similarly partitioned, we have 
А Ai ApY(XiBX, ХІВХ, 
tr(AX BX’) = tr [ee An) ховх, Х»ВХ, 
= tr(Ajii Xi BX)) + tr(Aiz X3 BX)) 
+ tr(A21 X1 BX5) + tr(A22X2BX5), 
as the remaining terms do not appear in the trace. However, (Ајо X2 B X 1) = X 1BX5A21, 


and since tr(P Q) = tr(Q P) and tr(S) = tr(S’) whenever 5, PQ and QP are square 
matrices, we have 


tr(AX BX’) = (Aj Xi BX]) -2tr(Aoi Xi BX5) + tr(Ao2 X5 B X2). 


We may now complete the quadratic form in tr(A»? X? B X5) + 2tr(A»i X, B X5) by taking 
amatrix C — Ay) Ani Xi and replacing Хэ by X2 + C. Note that when A > О, Ai; > О 
and A22 > О. Thus, 
tr(AX B X^) = tr(A22(X2 + C) B(X5 + Cy’) +tr(A1)X1BX4)—tr(A12A5y Аз X1 BX]) 
= tr(Ag2(X2 + C)B(X2 + C)) + (Ary — АрА5 A2)) X1 BX‘). 


On applying a result on partitioned matrices from Sect. 1.3, we have 
|A| = |Azal lA — A245? А211, 


and clearly, (2л) = (Ол) (20) 3". When integrating out X», LAE and (20) 5 are 
getting canceled. Hence, the marginal density of X;, the p; х q sub-matrix of X, denoted 
by fj,,4 (X1), is given by 
PL —1 4 
абау |B|? |Aj1 — 2027 A21|2 ed An AA Aa) X1BX1) (44.1) 
| Gaye 


When p; = 1, p» = О and p = 1, we have the usual multivariate Gaussian density. 
When p = 1, the 1 x 1 matrix A will be taken as 1 without any loss of generality. 
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Then, from (4.4.1), the multivariate (g-variate) Gaussian density corresponding to (4.2.3) 
is given by 


4 1 1 
fi Qt) = 20181 жщ) BE ащ 
л? (2л)? 
since the 1 x 1 quadratic form X, B X! is equal to its trace. It is usually expressed in terms 
of B = V-l, V > O. When а = 1, X is reducing to a p x 1 vector, say Y. Thus, for a 
р х 1 column vector Y with a location parameter џи, the density, denoted by f,,1(Y), is 
the following: 
1 a 
pins ee 0 0. (4.4.2) 
|V|2 (20) 2 
where Y = (y1, ..., уь), W = Uds Hp), —00 € yj < соо, =O < uj < œ, j = 
1,...,p, V > O. Observe that when Y is p x 1, tr(Y — u)'V-!(Y — u) = (Y — 
iu) V-*(Y — ш). From symmetry, we can write down the density of the sub-matrix X2 of 
X from the density given in (4.4.1). Let us denote the density of X? by fj, ; (X2). Then, 


P2 =f 4 

В|? |A22 — A21 Ат Apo]? = j 

fp, (X2) = |B|? |A22 11 2! е2 90422 -АлА A1) XoBX5). (4.4.3) 
(2л) 2 


Note that Ao. — AnAj А12 > Oas A > O, our intial assumptions being that A > O 
and В > О. 


Theorem 4.4.1. Let the p x q real matrix X have a real matrix-variate Gaussian density 
with the parameter matrices A > О and В > О where A is p x p and B isq x q. Let X 


be partitioned into sub-matrices as X — E where X, is pı x q and X» is p» x q, with 
2 


Ai А2 
A» 42 
pı X pı. Then Ху has a pj x q real matrix-variate Gaussian density with the parameter 
matrices Ау — Ai Ag An > O and B > O, as given in (4.4.1) and X» has a p2 xq 


pi + p2 = p. Let A be partitioned into sub-matrices as A = | | where Aj is 


real matrix-variate Gaussian density with the parameter matrices A22 — ААТА > О 
апа В > О, as given іп (4.4.3). 


Observe that the pı rows taken in X, need not be the first pı rows. They can be any 
set of ру rows. In that instance, it suffices to make the corresponding permutations in the 
rows and columns of A and B so that the new set of p, rows can be taken as the first pı 
rows, and similarly for X». 


Can a similar result be obtained in connection with a matrix-variate Gaussian distribu- 
tion if we take a set of column vectors and form a sub-matrix of X? Let us partition the 
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p x q matrix X into sub-matrices of columns as X = (Ү Y2) where Ү is p x qi and Y» is 
p X qo such that qı +92 = q. The variables Y1, Ү are used in order to avoid any confusion 
with X1, X» utilized in the discussions so far. Let us partition B as follows: 


B B А i 
В= |11 0121, Bu being qi x qi, B2 being q2 x 42. 
B» B» 


Then, 
Bu By} (Y; 
tr(AX BX’) = tr[A(Y] Y. 1 
= tr(AYı ВУ) + tr(AY2Bo1Y}) + tr(AY; Ву) + tr( AY? B22Y;) 
= tr(AY| Bi Y]) + 2tr(AY; Bi2Y5) + tr(A Y2 Вэ Ү;). 


As in the previous case of row sub-matrices, we complete ће quadratic form: 


tr(AX BX’) = tr(AY; Bi Y;) — (АУ (ВоВ B5iY]) + tr(A(Y2 + C) Во (Yo + Cy) 
= tr(AYı (B11 — Bi By] Boi) ¥{) + tr(A(Yo + C) B(Yo + Cy). 


Now, by integrating out Yo, we have the result, observing that A > О, B > О, By, — 
Bi B5j Bi > O and |B| = |B»| |В — Bi5 By] Bal. A similar result follows for the 
marginal density of Y2. These results will be stated as the next theorem. 


Theorem 4.4.2. Let the p x q real matrix X have a real matrix-variate Gaussian density 
with the parameter matrices M = O, A > O and B > O where A is p x p and B is 
q X q. Let X be partitioned into column sub-matrices as X = (Ү Y2) where Y; is p x qı 
and Y» is p x qo with qi + q2 = q. Then Y, has a p x дү real matrix-variate Gaussian 
density with the parameter matrices A > О and Ві — Bio B5, B21 > О, denoted by 
Ўр, V1), and Y? has a p х д2 real matrix-variate Gaussian density denoted by f, 4,(Y2) 
where 


4l =] p 
А1218 — BioByy Во? V ety (5 BB Bd) 


Ўр, (Y) = TEX (4.4.4) 
42 =i D 
fpa Va) = a Babu HUP s-jeAnGs-BaNSAOHL ——— (44,5) 
(Ол)? 


If g = 1 and q2 = 0 in (4.4.4), q = 1. When д = 1, the 1 x 1 matrix B is taken 
to be 1. Then Y; in (4.4.4) is p x 1 or a column vector of p real scalar variables. Let it 
still be denoted by Yı. Then the corresponding density, which is a real p-variate Gaussian 
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(normal) density, available from (4.4.4) or from the basic matrix-variate density, is the 
following: 


l p 
|AI2 (1/2)? e 30A YD 
р 


fri) = р 
л? 
1 1 
= А teary = АЁ trian, (44.6) 
(2л)? (2л)? 


observing that tr(Y;AY,) = Y, AY; since Y; is p x 1 and then, Ү AY, is 1 x 1. In the 
usual representation of a multivariate Gaussian density, A replaced by A = V~!, V being 
positive definite. 


Example 4.4.1. Let the 2 x 3 matrix X = (x;;) have a real matrix-variate distribution 
with the parameter matrices М = О, А > О, B > О where 


2 1 3 -1 
X21 X22 X23 0 1 
Let us partition X, A and B as follows: 
Xj Ај А2 By Ву 
X= = [0,0], A= ‚В = 
Я [71,2] ps ab Е 2 
where Aj; = (2), Ар = (1), An = (1), A» = (1), Xi = [xir x12, x13], X2 = 
[x21, x22, X23], 


[xir 212 [x13 |3 -1 _ |0 
Yı = E 2 Y= Bale By = Е 1 р, Вул = | 


B21 = [0, 1], B22 = (2). Compute the densities of X1, X2, Ү and Y2. 


Neo 


Solution 4.4.1. We need the following quantities: A1; — АрА5 A21 =2-1=1, 


Ay — A Ay Ар = 1-5 = 3, |В| = 1, 


к 1 1 1110 3 1 
В — Bu By Ву =2—[0, (5) [ ; H Ed m 


= 1 3 =] 
Bu- BaBa Ba -[ 5.7 |-[1]G)ou=[4 |. 
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Let us compute the constant parts or normalizing constants in the various densities. With 
our usual notations, the normalizing constants in fp, .¢(X1) and fp, (X2) are 


PL —1 4 1 3 
|IB|?]Aii — AnAz Anl? |B]2 0)? 


(2л) (2л)? 
р = 3 1 3 
|B}? |Аэз — An AT Anl? _ [BI2 2 
Ол) > (Ол)? 


Hence, the corresponding densities of X, and X» are the following: 


1 
Bj? Я 
fi3(X1) = JBP о-равю, —oo < xij < 00, j = 1,2,3, 
(2л)? 
1 
Bj? Ў 
fi,3(X2) = IBP q-i0oBXp. Spo ху < со, j = 1,2,3. 
2(2л)2 


Let us now evaluate the normalizing constants in the densities fj; (Y1), fp,4, (0): 


ai; EE p l 
А1218 - BB] В|? |APGO! 1 


(Ол) An2 8л” 


42 = р І 
JAI? [Boo — Bu By Bol? — APG) 1 


(2л) 2л 4л 


Thus, the density of Y; is 
= 1 —Atr(AYi(Bii - Bi By) B2)Y]] 
f2(Y) = 45e 


zl zd 
= ——-e 20, —oo < xij < ©, i, j = 1,2, 


2 1||хи хә 3 —1 X11 X21 
—t 
eset lee ely S 4] 


1 
2 2 2 2 
= бхр + X42 + 3x51 + 3 29— Axi1x12 — 2x11x22 


where 


+ 3x11X21 + 512522 — 2x12X21 — X22X21, 


Matrix- Variate Gaussian Distribution 261 


the density of Y2 being 


fa1(Y2) = Y e IlAYi G2 - Ba ВТ Bi)Y]) 
4л 


1 


= ec al2x 3429342013123] 
, 


—00 < Xia < оо, i = 1,2. 
4л 


This completes ће computations. 


4.4a. Marginal Densities in the Complex Matrix-variate Gaussian Case 


The derivations of the results are parallel to those provided in the real case. Accord- 
ingly, we will state the corresponding results. 


Theorem 4.4a.1. Letthe p xq matrix X have a complex matrix-variate Gaussian density 
with the parameter matrices M = O, A > O, B > O where A is p x p and B is 
q X q. Consider a row partitioning of X into sub-matrices Xy and Хә where X is pixq 
and X» is p2 x q, with pı + p2 = p. Then, X, and X» have рі x q complex matrix- 
variate and p» х q complex matrix-variate Gaussian densities with parameter matrices 
А11 -АрА Ад апа В, апа A2 — ААТ Ар апа В, respectively, denoted by О) 
апа Fao). The density of X; is given by 


-1 
Idet(B)|P! |det(A1; — А245, А1) (An АБАБ Aa fi BX) 


fpa (XD- Pid 


(4.4a.1) 


the corresponding vector case for p = 1 being available from (4.4a.1) for ру = 1, p; = 0 
and р = 1; in this case, the density is 


~ ~ t(B Y Y * 
Л.Х) = ш - а-в) (4.44.2) 
л 


where X, and шат 1 x q and n is a location parameter vector. The density of X» is the 
following: 


z „g, 18е В)1Р |е (Аз — An Ay А12) eus As AC Aa) So BE 

Ў, (2) = T e Uu ((A2—AnA,, A12) X BX5) (4.4a.3) 
Theorem 4.4a.2. Let X, A and B be as defined in Theorem 4.2a.1 and let X be parti- 
tioned into column sub-matrices X — (Yi Y>) where Y is p x qı and Yo is р X 4, so that 
qı + 4 = q. Then Yi and Y» have p X qı complex matrix-variate and p х q2 complex 


matrix-variate Gaussian densities given by 
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|det(A) |?! |det(Bi; — ВВ» Bo1)|? 


Tos = mPa 
x e "(Ai Gui - Bi B, Bai) ¥7}) (4.4a.4) 
frais) = Idet(A)|? |е Вз — B21 Bii. Boi)? 
т РФ? 
x е-Ч(АЁ(В›э—Вә1Вүу Bi) Y). (4.4а.5) 


When д = 1, we have the usual complex multivariate case. In this case, it will be a р- 
variate complex Gaussian density. This is available from (4.4a.4) by taking д = 1, q2 = 0 
and а = 1. Now, Y; isa p x 1 column vector. Let и be a p x 1 location parameter vector. 
Then the density is 


~ 2 d t A y * y 
fp1(¥1) = = Те (4.4а.6) 
л 
where A > О (Hermitian positive definite), Үү — wis р x 1 and its 1 x р conjugate 
transpose is (Yı — ш)*. 


Example 4.4a.1. Consider a 2 x 3 complex matrix-variate Gaussian distribution with the 
parameters М = О, А > О, В > О where 


zou Xo $ 2d A 
а id Tasi IE i 2 -i 
Хәр X22 X23 —i 2 ME 
Consider the partitioning 
Bu Вр| z Xi TES 
= , Х = ~ = Y ‚Т 
E | H Un. Ya] 


x. pnl s: 28 2 =i BE 
he fs #2). Вв= 7 imi 


Ху = [ni Xi, Xia], Xo = 21,322, %23], Ап = (2), Ар = (0), Aor = (~i), Ар = 2; 
By, = (2), Bj? = [—i, i]. Compute the densities of X1, X5, Y1, Y2. 
Solution 4.4a.1. Itis easy to ascertain that A = A* and В = В*; hence both matrices 


are Hermitian. As well, all the leading minors of A and B are positive so that A > O and 
В > О. We need the following numerical results: |A| = 3, | В| = 2, 


where 


= . 1 3 
Ап - ApÁg 4n 22- UD) =2- 5 = 5 
- А : 1 3 
An = АЛАТ А12 = 2—(-i)(1/2)(i) 22— 375 
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2 i i 2 
Е —1 mE DTI; ш oS 
Ву — ВВ, Bo = 2 — [-i, п(5) E | EI = 
2 -i i 1 2 –і —1 
= —1 25, _ =. MR у. = 
B»? B» By, By = | J E (5): І, 1] = Р 2 | | 


i 


_1[ 3 1-2 
~ 911+2i 3 5 


With these preliminary calculations, we can obtain the required densities with our usual 


notations: 
i, a(t) = |det(B)|?! |det(Aii — А1245 A201? 
рьд\ 1) = gp P14 
x е-Ч(Ап-АзА» Ал) BX] that is, 
25 x 2(3/2 3 3y BY* 
fi,3(&1) = a ёзен 
л 
where 
2 -—i i zu 
ME =i Хр 


ОТВ = [Х11‚,Х12,Х13] | i 
ч Dl ae 


= 241100 + 212015 + 2131 — 1X1 Xo + ixi, 
11 12 13 12 13 


e~ DE e~ DE e~ ~x e~ TK. 
+ 1X12X]]1 = 1X12X]3 — LX13X11 + 0X13X15; 


|det(A) |"? |det(Ao? — A21 Aj} Ai2)I 


А LER 
fp; (X2) y P24 
T e lAn-AnAq Ао) В] that is, 


Bo xe 23/2). зы 
fi,3(X2) = an > бк: 

л 
where let Q5 = X4BX3, Q» being obtained by replacing Х 1 in Qj by X»; 


Xy |det(A)|?! |det(Bi; — Вә В B21)” 
pnl) = z Pd 
x e HAN Bi Bio 855 Ва) that is, 
xe =. 32/3% 
f21Q)) = AAN E. e 9» 
л 
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where 
= Pay, = (ete) 2 n 
Q3 = Yi 1 = [x7], X2] —i 2||%ә\ 
= 2х1 + 2X21X21 + iXo1iX[i — iXX 
z s. |det(A)|?|det(B22 — B21 Bj; B12)” 
fpa; (Y2) = 
y P42 
x e АЎ(Вэо- Bn Ву Bi2)¥3] that is, 
"S 12 d 
f2,5(Y2) = Eri 22 
where 


2 і| {хр x 3 ILS | |e ХУ 
БЕРЧ КЕШЕ 
= 6xpij,- 6 13X73 + 6Xo0X», + 6Xo3X54 
+ [201 — 2i)(X12%73 + Х22Х53) + 2(1 + 2i) (2355 + X13X12)] 
+ [i — 2i) 22313 — 3125923) — (1 + 2i) (1343 — 32312)] 
+ 3i[XooX]5 + XoaX]4 — 125) — Х13Х53]. 


This completes the computations. 


Exercises 4.4 


4.4.1. Write down explicitly the density of a p x q matrix-variate Gaussian for p — 
3, q = 3. Then by integrating out the other variables, obtain the density for the case (1): 
р= 2,9 =2;(2): р = 2,9 = 1; (3): р = 1,9 = 2; (4): р = 1, 9 = 1. Take the location 
matrix М = O. Let A апа В to be general positive definite parameter matrices. 

4.4.2. Repeat Exercise 4.4.1 for the complex case. 


4.4.3. Write down the densities obtained in Exercises 4.4.1 and 4.4.2. Then evaluate the 
marginal densities for p — 2, q — 2 in both the real and complex domains by partitioning 
matrices and integrating out by using matrix methods. 


4.4.4. Let the 2 x 2real matrix A > О where the first row is (1, 1). Let the real В > O be 
3 x 3 where the first row is (1, —1, 1). Complete A and B with numbers of your choosing 
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so that A > O, B > O. Consider a real 2 x 3 matrix-variate Gaussian density with these 
A and B as the parameter matrices. Take your own non-null location matrix. Write down 
the matrix-variate Gaussian density explicitly. Then by integrating out the other variables, 
either directly or by matrix methods, obtain (1): the 1 x 3 matrix-variate Gaussian density; 
(2): the 2 x 2 matrix-variate Gaussian density from your 2 x 3 matrix-variate Gaussian 
density. 


4.4.5. Repeat Exercise 4.4.4 for the complex case if the first row of A is (1, 1 +i) and the 
first row of B is (2, 1 + i, 1 — i) where A = A* > O and В = B* > О. 


4.5. Conditional Densities in the Real Matrix-variate Gaussian Case 


Consider a real p x q matrix-variate Gaussian density with the parameters M — 
O, A > O, B > O. Let us consider the partition of the p x q real Gaussian matrix 


X into row sub-matrices as X — t where X, is py x q and X» is po x q with 
2 


pi + p2 = p. We have already established that the marginal density of X» is 


aj 4 P2 

= 2 2 E / 

fp; (X2) = i ЕЕ 1 е 2310422 An Ати А12) ВХ], 
(Ол)? 


Thus, the conditional density of X, given X» is obtained as 


4 2 P24 
xax des _ |AI3|B|? (2л) 
fpi 1] али aA ТРГ 5 
р2,40А2) |А — Ari Aq, Anl? |B|? (л)? 
x e lt AX BX^))-tr[(A22- Az Ay A2) Х2В X5]. 
Note that 


жЕ: oum: [XBX XBX, 
AXBX = a (x) B(X| xy - A ax! VBX 


- All А12 Х\ВХ, X, BX) _ |@ * 

© Аз A22 X3BX| Х›ВХ; |ж В 
where œ = Ai Xj BX + Ai X5BX|, В = A» ХІВХ› + A22 X5 B X», and the asterisks 
designate elements that are not utilized in the determination of the trace. Then 


tr(AX B X^) = tr(Aji Xi BX} + АрХ2ВХ)) + tr(Aoi X1 BX5 + A22 X5 B X). 
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Thus the exponent in the conditional density simplifies to 


tr(Aji X1 BX)) + 2tr(A12 X2 B X1) + tr(Ao2 X2 B X5) — tr[ (A22 — Ao Aj} A12)X2BX})] 
= tr(Ai Xi BX)) + 2tr(A12X2BX/) + tr[Ao1 Aj] Ai2X2 B X2] 
= А11(Х1 + C)B(X1 + CY], C= Aj] А12Хэ. 


We note that E (X1|X2) = —C = —Атү А12Хэ: the regression of X, on X», the constant 
part being |A11| $ |B| 3 / (2л). Hence the following result: 


Theorem 4.5.1. Ifthe p x д matrix X has a real matrix-variate Gaussian density with 
the parameter matrices M = О, A > О апа B > О where A is p x рапа Bisq xq 


and if X is partitioned into row sub-matrices X = ) where X, is pj x q and X» is 


X2 
p2 x q, so that ру + рә = p, then the conditional density of Ху given X», denoted by 
Хр, X11 X2), is given by 


А1812 
1 T 
fp. (X1lX2) = ee A (4.5.1) 
(2л) 2 
where С = Aj] Ai2Xo if the location parameter is a null matrix; otherwise C = —M, + 


Aj] Aia (Xo — M») with M partitioned into row sub-matrices Mı and Мэ, Мі being pi xq 
and М», p» x q. 


Corollary 4.5.1. Let X, Xj, X2, M, Mı and Mo» be as defined in Theorem 4.5.1; 
then, in the real Gaussian case, the conditional expectation of Ху given X», denoted by 
Е(Х || Хэ), is 

E(XılX2) = Mı — Aj] An(X2 — М»). (4.5.2) 


We may adopt the following general notation to represent a real matrix-variate Gaus- 
sian (or normal) density: 


X ~ Np4(M, A, В), A> О, B >O, (4.5.3) 


which signifies that the p x q matrix X has a real matrix-variate Gaussian distribution 
with location parameter matrix M and parameter matrices A > O and B > O where A is 
p x p and B is q x q. Accordingly, the usual q-variate multivariate normal density will 
be denoted as follows: 


Xi ~ Mq(u, B), B > O = X, №, (и, B™!), B > О, (4.5.4) 
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where jz is the location parameter vector, which is the first row of М, and X1 isal хд 
row vector consisting of the first row of the matrix X. Note that B^! = Cov(X,) and 
the covariance matrix usually appears as the second parameter in the standard notation 
NC, -). In this case, the 1 x 1 matrix A will be taken as 1 to be consistent with the 
usual notation in the real multivariate normal case. The corresponding column case will 
be denoted as follows: 


Yi ~ Np (ща), А), A> О = Ү ~ Np(uay, A), A> О, A ! = Cov(Yi) (4.5.5) 


where Y; is a p x 1 vector consisting of the first column of X and uç is the first column 
of M. With this partitioning of X, we have the following result: 


Theorem 4.5.2. Let the real matrices X, M, A and B be as defined in Theorem 4.5.1 
and X be partitioned into column sub-matrices as X = (Үү Y2) where Үү is p x gi 
and Y is p x qo with дү + Ф = q. Let the density of X, the marginal densities 
of Yı and Y» and the conditional density of Y, given Y», be respectively denoted by 
fpa СХ), fpi (YD, Рр, (Y2) and fp qı (Y1|Y2). Then, the conditional density of Y, given 
Yo is 


a р 
Л» qa QO1Y2) = „шь e 301A 1 - May C) Ви (Yi -May 01] (4.5.6) 
' T 51 
where А > О, Ву > O and C, = (Y2 — M2) B21 Bi, so that the conditional expectation 
of Yı given Үә, or the regression of Үү on Y», is obtained as 
E(Yi|Y2) = May — (Y - May) Bu Bj, M = (Ma) Mo), (4.5.7) 


where Ma) is p x qı and Мо) is p x q2 with qı + qo = q. As well, the conditional density 
of Y? given Y; is the following: 


42 р 
Al =| Boo? | 
fpo (Y21Y1) = E x e 3ULAQ02- Moy C2) Baa (Y2—M2) C2) ] (4.5.8) 
т)? 
where 
Мо) — C2 = May – (Yı — May) Вә Bj) = E[Y2|Y¥1). (4.5.9) 


Example 4.5.1. Consider a 2 x 3 real matrix X = (xij) having a real matrix-variate 
Gaussian distribution with the parameters M, A > O and B > O where 


2-1 1 
м=|, E: ahas = -1 3 0 
1 0 1 


268 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


= X 
Let X be partitioned as MEZ Y2] where X, = [x11, X12, x13], X2 = [x21, X22, X23], 
2 


X]p X x А 7. me І 
Yı = | H J and Y2 = pi Determine the conditional densities of X, given 
23 


Xo, X» given X1, Yı given Y2, Y2 given Y; and the conditional expectations E[X1|X»], 
E[X2|X1], E[Yi|Y2] and E[Y2|Y;]. 


Solution 4.5.1. Given the specified partitions of X, A and B are partitioned accordingly 
as follows: 


Ai А Bi Вә о] 1 

va oes ‚В= ‚ В = ‚ Ву = 
ie | B a H E 3 | i H 
B2, = [1,0], Boo = (D, Ат = (2), А12 = (1), Аз = (1), Azz = (3). 


The following numerical results are needed: 


A11 = A245] Азу = 2 — (1) (1/3) (1) = 


NIL Maw CA 


A» — An Aj An = 3 — (1)(1/2)(1) = 

Ви — ВВ; Вә = Е E - Й ап, о] = Е E 
2 

B»; — Bi Bjj Bi? = 1 — [1, 0101/5) [ ; H xi 


= 5 R 5 
ІА = 2, 1421 23, |A| = 5, |А — АрА А11 = 3' [А2 — A21 Ay, Al = 5? 


= _ 2 
|Bi| = 5, [В| = 1, |Bii — Bi Bj В| = 2, |Bx2 — B21 Bj Вр| = z |B = 2; 


—1 1 —1 1 —1 
Aj A12 = 2" Ax A21 = 3? B5, B» = [1, 0] 
2 L.S 1][1 113 
1 —1 1 
р 
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All the conditional expectations can now be determined. They are 
3 1 
E[X1|X?] = Mi — Aj Ap(X2 — М) = [L, 21, 1] — 5 C — 2, x22, X23 + 1) 
= [1 =, 2),—1 | 1 n +1)] (0) 
= jc , z 22, z (023 l 
= 1 
E[X2|Xi] = М — Ag A21 (Ху — M) = [2, 0, -1] — 3101 —1,х12 + 1, x13 — 1] 


1 1 1 
= 12-3011 - D. бг +1),—1— бз — DI (i 
zs 1 -1l 1 — 1 3 1 
ЕПУ = Mo) — (2 — May) Bau By! = E : | -i б E ] [1.0] | J 
| [1-7 оз = D puc a 
= p: jur tga eo) Me 


E|Y2|]Yi] = Мо — (Yı - Мо) Врв»! 
_ 1 xj —1 х2+1||1 Е 2— х . 
E | Ё —2 X22 0 О] |1— Х21 j (v) 
The conditional densities can now be obtained. That of X, given X» is 


Ам В|= 
fpi a (X11X2) = pen ie e-2ulAu (X1— M1 +) BG M1 +C)'] 
л)? 


for the matrices A > О апа В > О previously specified; that is, 


fia (XlX) = e 300-MiFC)BOG- Mic C 


л)? 


where M, — С = Е[Х{||Х2] is given in (i). The conditional density of Х| Хү is the 
following: 


Азло|#|В|> 
foa (X21X1) = А2212 |B] е 2 ША22(Х2- M2 C1) BXG- M» C1] 
, x 224 
that is, 


cee 
fi,3(X21X1) = 200 е2 (X2— Mi CD BOG- Mi Cy 
| (2л)? 
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where M» — C» = E[X2|X1] is given in (ii). The conditional density of Y; given Y» is 


AI? | Bil? 
fp qı (0110) — ы. е—2%А@1—-Мау+С»)Ви1(Ү—Мау+С»)/]. 
п) 2. 


that is, 


foa(YilY3) = m e 301A Q1-Mq4-€2) Bu (| -Ma-03)] 
VIA 


where Ma) — Сү = Е[Ү |У] is specified in (iii). Finally, the conditional density of Y?|Y1 
is the following: 


42 р 
fp (Y21Y) = |A|? к. 2 е2 А(0— Мо)+С3) В22(02— May C3]. 


л) 2 
that is, 


fo.1(¥al¥1) = s е "I[AY2-Mo)+C3)B2(Y2-MQ)+C3)] 
I 


where Мо) — Сз = E[Y2|Y1] is given in (iv). This completes the computations. 
4.5a. Conditional Densities in the Matrix-variate Complex Gaussian Case 


The corresponding distributions in the complex case closely parallel those obtained for 
the real case. A tilde will be utilized to distinguish them from the real distributions. Thus, 


X ~ №, 4(М, А, В), A= А* > О, B= B* > О 


will denote a complex p х q matrix Х having а p x q matrix-variate complex Gaussian 
density. For the 1 x q case, that is, the g-variate multivariate normal distribution in the 
complex case, which is obtained from the marginal distribution of the first row of X , we 
have 

Xi ~ М.ли, B), В > О, Xy №(и, B )), B | = Соу(Х\), 


where X is 1 x q vector having a q-variate complex normal density with E(X1) = p. 
The case g = 1 corresponds to a column vector in X, which constitutes a p x 1 column 
vector in the complex domain. Letting it be denoted as Y1, we have 


Yi ~ №10), A), А > О, thatis, Ў ~ Np(uqay, А), A | = Cov(Yi), 


where шт) is the first column of M. 
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Theorem 4.5a.1. Let X be p x q matrix in the complex domain having a p x q matrix- 


zo oc - X 
variate complex Gaussian density denoted by fp (X). Let X = (z) be a row partition- 
2 


ing of X into sub-matrices where Xp is pı x q and Xii is p2 х q, with рі + p» = p. Then 
the conditional density of X given X» denoted by Jar a | Хә), is given by 


~ ~~ det(A11)|7|det(B)|?! : UN аа 
fous Het OP e a Me | (45a) 


where C — Aj] Aiz(X2 — M5), E[X] = М = я and the regression of Х\ on X) is 
2 
as follows: 


—1 e А М, 
1— Aq Ар(Х2 – М) if M = 


E(X,|X2) = М» (4.5a.2) 


-A Ai2 X2 if M — O. 
Analogously, the conditional density of Xə given X; is 


~ ai зы det(A25)|7 |det( B)| 72 т = А 
РЫ) =! ( = (8)? ыа Go Mi C BOG - Mi C1 (4.54.3) 


where C, — A5) Аз (X4 — Мі), so that the conditional expectation of X5 given Xi or the 
regression of X» on X is given by 


E[X2|X1] = М — Ag] An (X1 — My). (4.5a.4) 


Theorem 4.5a.2. Let X be as defined in Theorem 4.5a.1. Let X be partitioned into col- 
umn submatrices, that is, X — (Y, Ya) where Y is pxq, апа Ӯ, і is px qo, with q14-q» = q. 
Then the conditional density of Y, given Y», denoted by di (Y;|Y2) is given by 


A T qı P ~ ~ 5 
л 


where Cay = (Y — Мо) Bn Вт, and the regression of Yi on Y, or the conditional 
expectation of Y, given Y» is given by 


E(¥1|¥2) = May — (Y — Мо) Bai By (4.54.6) 
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with E[X] = М = [Ma) Mo] = ЕЎ Y?]. As well the conditional density of Y» given 
Y, is the following: 


~ ~~ det(A) |£ |det( B22) |? > 5 à 
fpo (YalY1) — [eeu ке (B22)| e ША(2 Мо) + CQ) B22(Y2—May+CQy)"] (4.54.7) 


where Co) = (X — May) BiB and the conditional expectation of Yo given Yi is then 
E[Y2|Y1] = Мо) – (Yı — Mq))Bi2 B5, (4.5a.8) 


Example 4.5a.1. Consider a 2 x 3 matrix-variate complex Gaussian distribution with the 
parameters 


0 


3 =i 
ieee: gp [iP i oi 
TE в E: | m= eth =| i 2+i a 


| ВЕЗИР ~ X ~~ ~ DE - 
Consider the partitioning of X — H = [ү Yo] where Хү = [X11, X12, X13], Xo = 
2 


"ELE - X - x12 X А m " 

[X21, X22, X23], ү = 411 | and Y, — 42 ^l3| Determine the conditional densities of 
X21 X22 X23 

X1|X2, X2|X1, Ү |0 and Y2|Y; and the corresponding conditional expectations. 


Solution 4.5a.1. As per the partitioning of X, we have the following partitions of A, B 
and M: 


| {An А _ {Bu Bi _|2 i 1 —i _ | 
а= [Ат е | во аео 


: А E I: cae E 1 
Ап = (2), Ар = (0), Алу = (i), Ax = (1), Вр = [-i, 0], Ат = 5, А» = 1, Bl = 3° 


2 


Ап — AnA А = 2— (Di) = 1, [Ап — AA Ail = 1, 


E 1 => 1 
Аз — A A An = 5? [А22 — Ani Aj] Aral = 5? |A| = 1, |B| = 2, 
= Я 1 -i i _ 
By — ВВ B21 = 3 - [-i 0] [| А | H = 2, |Bii — ВВ! В| = 2, 


і 


Е 2 і | 2 =i 2 
Вэ — Ву Вр= |; ү||—|у|0/3[—ї10]=|_+ |), 182- Ba By Bi2| = 5. 


му” +, мз = 2+; 1-1 Mg | T | Ma =| | р 
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All the conditional expectations can now be determined. They are 
Е[Х 2] = Mi - Ap An - M) = [1 +i, i -i]- 206 - M» © 


E[XjXi] = М — Ag An(X1— Mı) = [i, 247, 1— i] i(X1 Mi). Gi) 


"n " E 1 2+ ix 
_ E а I 12 T 
E[Yi|]Y?]] = May — (0 - My) Ba Bj, = 3 Е jeg E (iii) 
ЕГ] = Мо) — (Yi — May) Вә Bj 
_| @ i | [$&n-Q-c-D|yp; 1 -i 
E Е] | i-i JI i ДЕ | 
_ іх +1 Х1—1—2ї (iy) 
"^ | —ixey Hi +3 X27 +1-—2i]° 


Now, on substituting the above quantities in equations (4.5a.1), (4.5a.3), (4.5a.5) 
and (4.5a.7), the following densities are obtained: 


zs д9 АЈ 24 огӯ Y. * 

лае вии 
where E, = E[X4|X2] given in (i); 

"en ee: NES 

fi3(X2|X1) = =e (X2—E2)B(X2— E2) 
where E» = E[X2|X1] given in (ii); 


~ (man 3? р 7 * 
рш а 
where Ёз = E[Y1|Yol given in (iii); 


faa] Yi) = с e ЧА(№ – Ea) B (Y — E4)*] 
: 4 
л 


where E4 = E [PY 1] given in (iv). The exponent in the density of Y 1 Ya can be simplified 
as follows: 


- trLACY, = My) Ву\(Ў\ = May)*] = —3(Y E Ma)" AQ _ May) 
= —3[(¥11 — (1 +i))* a - D*] Ё 1 К — (1+ | 


(X21 — i) 


2 2 PR 2 3 
= —6[G11 x15) + 50934 + X213) + 112X211 — Х111Х212) — 2x112 — X111 — X211 + 5} 


by writing Xy1 = xk11 + ixk12, k = 1,2, i = J/(—1). This completes the computations. 
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4.5.1. Re-examination of the case g = 1 


When q — 1, we have a p x 1 vector-variate or the usual p-variate Gaussian density 
of the form in (4.5.5). Let us consider the real case first. Let the p x 1 vector be denoted 
by Yı with 


Yi 


УІ Yp\t+i 
] = [Ро = DO 


| 

| 
[-—:—1 
Ваа 
N _ 
— м 


Ур Ур, Ур 


м? mı pil 
May = | (p | ма =| i |. MG = : ‚ КҮ| = Mo), pi + p2 = p. 


тр, тр 
Then, from (4.5.2) wherein g = 1, we have 


E[Y(y]Yo] = mi = Aj] А12(Уо, = Me”), (4.5.10) 
with A = X-l, x being the covariance matrix of Yj, that is, Cov(Yi) = E[(% — 
E(Y)))(Y1 — E(Y)))']. Let 


Au Mo 


Aere 
Е 222 


| , Where Xj, is pı x р and 2722 is p2 x po. 


From the partitioning of matrices presented in Sect. 1.3, we have 
- Ay An = AP (AP)! = xj xj. (4.5.11) 


Accordingly, we may rewrite (4.5.10) in terms of the sub-matrices of the covariance matrix 
as 


EY Yo] = MP? + Zi Zj (Yo) — MB”). (4.5.12) 


If p; = 1, then Yo) will contain p — 1 elements, denoted by Y, Q) = (у2,..., yp). Letting 
E[y1] = ті, we have 


Eil Yo] =m + ZiZj (Yo) — M3), pi = р 1. (4.5.13) 


The conditional expectation (4.5.13) is the best predictor of y, at the preassigned values 
Of y2, ..., yp, Where тү = Ef[yi]. It will now be shown that 53 can be expressed 
in terms of variances and correlations. Let оў = ojj = Var(yj) where Var(-) denotes the 


Matrix- Variate Gaussian Distribution 275 


variance of (-). Note that o;; = Соу(у;, yj) or the covariance between y; and yj. Letting 
pij be the correlation between y; and y;, we have 


X12 = [Cov(y1, y2), ..., Cov(y1, yp)] 


= (0102/12, .--, 010p p1p]. 
Then 
0101 0102012 +++ O1Opplp 
0201521 0202 Uc 020pp2p 
= . , | » Pij = рјі, Pjj = 1, 

O0p5010pl OpO2Dp2 `` OpOp 
for all j. Let D = diag(o1,...,05) be a diagonal matrix whose diagonal elements are 
01, ..., Op, the standard deviations of уџ,..., ур, respectively. Letting А = (pij) = de- 


note the correlation matrix wherein pj; is the correlation between y; and y;, we can express 
X as DRD, that is, 


O11 S12 *'* Olp 0| оО... 0 1 02" б\р 0| Q was 0 
LN 021 022 с Фр Е 0 O2 «+ 0 021 1 5+ 2p 0 02 > 0 
los Op2 с 2 E Be ass d b. 0р2 s 1 | E 0... А 


so that 


x '=pD'!R'pD!, p=2,3,... (4.5.14) 
We can then re-express (4.5.13) in terms of variances and correlations since 
55у 2o Ri Do) Da Ry Ро) = oi RR Р) 
where Do) = diag(o2, ...,0,) and R is partitioned accordingly. Thus, 
Eil Yo] = mi + от RR 51 (Yo) — MS”). (4.5.15) 


An interesting particular case occurs when p — 2, as there are then only two real scalar 
variables y; and y2, and 


oO 
Elyily2] = mı + 5,6» — m»), (4.5.16) 


which is the regression of у on y» or the best predictor of yı at a given value of y2. 
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4.6. Sampling from a Real Matrix-variate Gaussian Density 


Let the p x д matrix X, = (хуу) have a p x q real matrix-variate Gaussian density 
with parameter matrices M, A > O and B > O. When n independently and identically 
distributed (iid) matrix random variables that are distributed as X, are available, we say 
that we have a simple random sample of size n from Хо or from the population distributed 
as Xa. We will consider simple random samples from a p x q matrix-variate Gaussian pop- 
ulation in the real and complex domains. Since the procedures are parallel to those utilized 
in the vector variable case, we will recall the particulars in connection with that particu- 
lar case. Some of the following materials are re-examinations of those already presented 
Chap. 3. For д = 1, we have a p-vector which will be denoted by Yj. In our previous 
notations, Y; is the same Y; for gq; = 1, q2 = О and q = 1. Consider a sample of size 
n from a population distributed as Y; and let the p x n sample matrix be denoted by Y. 
Then, 


Yu Y2 co^ Yin yu 
Уі 22 c^ Y2n 
Nee np cc sce s 
1 
Ypl Yp2 ^^^ Урп Ур 
In this case, the columns of Y, that is, У;, j = 1,...,n, are iid variables, distributed as 


Y,. Let ann x 1 column vector whose components are all equal to 1 be denoted by J and 
consider 


уп се Jap yı 
rawa A A e” 
n n : E Я 1 H 
| эн с Ypn | 
where yj = ien denotes the average of the variables, distributed as y;. Let 
у dee Mi 
TR әр 
S = (Y — Ү)(Ү — Y) where the bold-faced Y = |. MEE pA 
Xp oc Ӱр 
Then, 
S = (ij), sij = У Ооа — Yi) Gk — Уу) for all i and j. (4.6.1) 


k=1 
This matrix S is known as the sample sum of products matrix or corrected sample sum of 
products matrix. Here “corrected” indicates that the deviations are taken from the respec- 
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tive averages y1, ..., yp. Note that Is; у is equal to the sample covariance between у; and 
уу and when i = j, itis the sample variance of y;. Observing that 
1 1 
1 П wee. T 
J=|:|>J3J=|. . .| and J'J =n, 
1 Too 1 
we have 
Lo - 2 Ll 
Y(-JJ)2Y 2Y-Y-YII —-JJ. 
n n 
Hence 
7 v 1 £ 1 ГА ТА 944 
5 = (Ү – Ү)(Ү – Y) = Ү[/ – -JJ JU ——JJ]Y. 
п п 
However, 
1 / 1 ^y 1 / 1 / 1 / / 
[—--JJ]I—--JJ]291—-JJ ——JJ + 5JJJJ 
n n n n n 
1 he nt / 
= I — —JJ' since J J =n. 
n 
Thus, 
1 
S = Y[I — -J JTY”. (4.6.2) 
n 


Letting C4 = (I — IJJ^, we note that e = Су and that the rank of C, is n — 1. 
Accordingly, C, is an idempotent matrix having n — 1 eigenvalues equal to 1, the remaining 
one being equal to zero. Now, letting C2 = iJ J’, it is easy to verify that C = С» and 
that the rank of С» is one; thus, С» is idempotent with п — 1 eigenvalues equal to zero, 
the remaining one being equal to 1. Further, since СС» = О, that is, С and С» are 
orthogonal to each other, Y — Y = ҮС; and Y = ҮС» are independently distributed, so 
that Y — Y and Y are independently distributed. Consequently, 5 = (Y — Y)(Y — Y)' and 
Y are independently distributed as well. This will be stated as the next result. 


Theorem 4.6.1, 4.6a.1. Let Yj,..., Y, bea simple random sample of size n from a p- 
variate real Gaussian population having a Ny(u, X), У > О, distribution. Let Y be the 
sample average and S be the sample sum of products matrix; then, Y and S are statistically 
independently distributed. In the complex domain, let the Y js be iid Мь(, 5), 5 = 
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Ex > О, and Y and S denote the sample average and sample sum of products matrix; 


then, Y and S are independently distributed. 


4.6.1. The distribution of the sample sum of products matrix, real case 


Reprising the notations of Sect. 4.6, let the p x n matrix Y denote a sample matrix 
whose columns Y;, ..., Y, are iid as М, (и, X), X > О, Gaussian vectors. Let the sample 
mean be Y = 2(¥; +: + Yn) = 1YJ where J’ = (1,..., 1). Let the bold-faced 
matrix Y = [Y, ..., Y] = YC, where С = I, — 1JJ'. Note that Cy = I, — C; = С? 
and C5 — IJ J = С, that is, Сү апа С» аге idempotent matrices whose respective 
ranks are n — 1 and 1. Since Cy = Ci; there exists an n x n orthonormal matrix P, 
PP’ = 1,, Р'Р = 1,, such that P’C,; P = D where 


__ һ-\ О |. p’ 
jal Б; 


Let Y = ZP’ where Z is p x n. Then, Y = ZP' > ҮС = ZP'C; = ZP'PDP' = 
Z D P’, so that 


= 1 IN Ij О In- О / 
S = (YCi)(YCi) = ҮС|С,Ү -z| O || О „| 


== (Zn-1, O)(Zn-1, оу = Z4 cd (4.6.3) 
where 2,1 isa p x (n — 1) matrix obtained by deleting the last column of the p x n matrix 
Z. Thus, $ — 23-120 1 where 2„—1 contains p(n — 1) distinct real variables. Accord- 
ingly, Theorems 4.2.1, 4.2.2, 4.2.3, and the analogous results in the complex domain, are 
applicable to Z„—1 as well as to the corresponding quantity Ž„—1 in the complex case. Ob- 
serve that when Y; ^ №, (и, X), Y — Y has expected value М-М = О, М = (u,..., и). 
Hence, Y — Y = (Y — M) — (Y — М) and therefore, without any loss of generality, we can 
assume Y; to be coming from a N,(O, X), X > О, vector random variable whenever 
Y — Y is involved. 


Theorem 4.6.2. Let Y, Y, Y, J, С and Cə be as defined in this section. Then, the 
р х п matrix (Y — Y)J = О, which implies that there exist linear relationships among 
the columns of Y. However, all the elements of Z,, | as defined in (4.6.3) are distinct real 
variables. Thus, Theorems 4.2.1, 4.2.2 and 4.2.3 are applicable to Z1. 


Note that the corresponding result for the complex Gaussian case also holds. 
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4.6.2. Linear functions of sample vectors 


Let Y; E Npy(u, X), У > О, j =1,...,n, or equivalently, let the Y;'s constitutes 
a simple random sample of size n from this p-variate real Gaussian population. Then, the 
density of the p x n sample matrix Y, denoted by L(Y), is the following: 


П) е 227 ММ, 
(27)? |X|? 
where M = (u, ..., u) 15 p xn whose columns are all equal to the p x 1 parameter vector 
ш. Consider a linear function of the sample values Y,,..., Y,. Let the linear function be 


U = YA where A is an n x q constant matrix of rank q, q < p < п, so that U is p x q. 
Let us consider the mgf of U. Since U is p x q, we employ ag x p parameter matrix T so 
that tr(T U) will contain all the elements in U multiplied by the corresponding parameters. 
The mgf of U is then 


My(T) = Efe) = E[e"TY®] = Efet4TY] 
= е (АТМ) pietr(AT(Y7M))] 


where М = (j1,..., и). Letting W = Z-2(Y — М), dY = | Z|2dW and 


MU (T) = et(ATMD) 54$ Брет) 


elt(ATM) 1 | ; 
J e (AT E? W)—5t(WW )dW. 
W 


" (2л)? 
Now, expanding 
tr[(W — C)(W — С) = (W W’) — 2tr(W C^) + tr(CC’). 


and comparing the resulting expression with the exponent in the integrand, which ex- 
cluding —, is tr(WW’) — 2tr(AT 52W), we may let C = AT X? so that (СС) = 
tr(AT XT'A^) = tr(T XT'A'A). Since tr(ATM) = tr(TMA) and 


1 ; 
z J eg 3 -OO-ODaw = 1. 
(2m)? JW 


we have 
My(T) = МұА(Т) = e"(TMA)+3tr(T ET! A’ A) 


where MA = E[YA], X > О, A'A > О, A being a full rank matrix, and T YT'A’A is 
aq x q positive definite matrix. Hence, the p x q matrix U — YA has a matrix-variate 
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real Gaussian density with the parameters MA = E[YA] and A'A > О, X > О. Thus, 
the following result: 


Theorem 4.6.3, 4.6a.2. Let Y; №(и, 3), X > О, j = L...,n, or equiva- 
lently, let the Y ;'s constitutes a mh random sample of size n from this p-variate real 
e Жой рк: Consider a set of linear functions of Yı, ..., Y, U = YA where 

= (Yj,..., Y,) isa p xn sample matrix and A is an n x q constant matrix of rank q, 
q МЕ р < п. Then, О has а nonsingular р х q matrix-variate real Gaussian distribution 
with the parameters MA = E[YA], A'A > O, and X > О. Analogously, in the complex 
domain, U = YA is a p х q-variate complex Gaussian distribution with the correspond- 
ing parameters E[(YA], A*A > О, and X > О, A* denoting the conjugate transpose of 
A. In the usual format of a p x q matrix-variate Np (М, A, B) real Gaussian density, M 
is replaced by M A, A, by A'A and B, by X, in the real case, with corresponding changes 
for the complex case. 


A certain particular case turns out to be of interest. Observe that MA = u(J'A), J’ = 
(1,..., 1), and that when g = 1, we are considering Hid one linear combination of 
Yi,..., Y, inthe form Л = а +: "atus where a1, ..., an are real scalar constants. 
Then J'A = У уау, A'A = У 145, and the p x 1 vector U; has a p-variate real 
nonsingular Gaussian distribution with the parameters ( j=14 j)u and 3C = аў) X. This 
result was stated in Theorem 3.5.4. 


Corollary 4.6.1, 4.6a.1. Let A as defined in Theorem 4.6.3 be n x 1, in which case 
A is a column vector whose components are a4,...,ag, and the resulting single linear 
function of Y1,..., Y, is Uy = аў +--+ + а, Ys. Let the population be p-variate real 
Gaussian with the parameters и and У > O. Then О has a p-variate nonsingular real 
normal distribution with the parameters Q7 —ı4j)u and (>; 24122. Analogously, in 


the complex Gaussian population case, Ü [=a Y pe "m n is distributed as a complex 


Gaussian with mean value Qi ай and covariance matrix (а аўа;)®Ў. Taking 
ар =з = ал = 1, Ui = 104 qo YQ) = Y, the sample average, which has a 
p-variate real Gaussian density with the parameters u and ly . Correspondingly, in the 
complex Gaussian case, the sample average Y is a p-variate complex Gaussian vector 
with the parameters р and 15, X= У* > О. 


4.6.3. The general real matrix-variate case 


In order to avoid a multiplicity of symbols, we will denote the p x q real matrix-variate 
random variable by X, = (Xija) and the corresponding complex matrix by X, = (Xija). 
Consider a simple random sample of size n from the population represented by the real 
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p х q matrix Xy = (xijg). Let X, = (xijg) be the o-th sample value, so that the Хо’, 


а = l,...,n, are iid as X4. Let the p x nq sample matrix be denoted by the bold-faced 
X = [X1, X2, ..., Xn] where each X; is p x д. Let the sample average be denoted by 
X = (Xi) xij = Lx Xijg. Let Xq be the sample deviation matrix which is the 
p X qn matrix 

Xa I [X1 um X, X2 = X, Sneg Xn = X], Xo Х == (Xijo — Xij), (4.6.4) 


wherein the corresponding sample average is subtracted from each element. For example, 


ХПа — X11 Хк — X12 ^77. Xlqa — X1g 
а Ха — X21 X22 — X22 `° X2qa — X2q 
Xo -X= б 2 К 
Xpla — Xpl Xp2a — Xp2 *** Хрда — Хра 
= [С С» с Сае | (0) 


where С ja is the j-th column in the œ-th sample deviation matrix X — X. In this notation, 
the p x qn sample deviation matrix can be expressed as follows: 


Ха = (Cis, Сә1,..., Сд C12, C22, «++, Сф, +++, Cin, Can, +. +5 Сап] (ii) 


where, for example, Cyg denotes the y-th column in the o-th p x q matrix, Xy — X, that 
is, 


Xlya — Ху 

Хуа — Ху T 
Cya = | : (iii) 

Хруа — Xpy 


Then, the sample sum of products matrix, denoted by S, is given by 
S = ХаХа = ertet + СС Tec Cats 
+ Сос + Cn C» + +++ + Са Сиз 


+ СС\„ + Can Co, apine CanCan: (iv) 


Let us rearrange these matrices by collecting the terms relevant to each column of X which 
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Then, the terms relevant to these columns are the following: 


5 = ХаХа = Cn Ch + Cry, +: + СС 
+ СС\» + С»»С»» Stee esp СС» 


= 51+ 5$ e, (v) 


where Sı denotes the p x p sample sum of products matrix in the first column of X, So, 
the p x p sample sum of products matrix corresponding to the second column of X, and 
so on, S, being equal to the p x p sample sum of products matrix corresponding to the 
q-th column of X. 


Theorem 4.6.4. Let X, = (xijg) bea real p x д matrix of distinct real scalar variables 


Xijq’s. Letting Xy, X, X, Ха, S, and S\,..., Sq be as previously defined, the sample 
sum of products matrix in the p x nq sample matrix X, denoted by S, is given by 


S= S1 +-+ S4. (4.6.5) 


Example 4.6.1. Consider a 2 x 2 real matrix-variate N2,2(0, A, B) distribution with the 


parameters 
2 1 3 —1 
а= [| | ав = | 5 | 
Let Xa, a = 1,...,5, be a simple random sample of size 5 from this real Gaussian 
population. Suppose that the following observations on Xy, a = 1,...,5, were obtained: 


E -1 1 0 1 
Mee R 

-1 1 -4 1 
Е see E 


Compute the sample matrix, the sample average, the sample deviation matrix and the sam- 
ple sum of products matrix. 
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Solution 4.6.1. Тһе sample average is available as 


= 1 
зз 


рг. о 1+1+1+1+1 ЕЕ | 


—.5| 1+(-2)+14+1+(-1 2-1-2-42-4(—2) 


The deviations are then 
= 1 1 —1 1 2 0 0 0 
х=) al-l 0 ial 1 жы =| 5 J 
1 0 0 0 -3 0 
Хза = l || DeL l | ‚ Хѕа = E E 


Thus, the sample matrix, the sample average matrix and the sample deviation matrix, de- 
noted by bold-faced letters, are the following: 


X = [X1, X2, X3, Ха, Xs], X = [X, ..., X] and Ха = [Xiq, X24, Хза, Хаа, Ха]. 
The sample sum of products matrix is then 
S = [X – X][X — XJ’ = [XalIXal' = Si + Sp 


where Sı is obtained from the first columns of each of X94, a = 1,...,5, and S5 is 
evaluated from the second columns of X44, œ = 1,...,5. That is, 


2 0 1 0 3 
te H 21+ о [0 —2]+ H [1 1]+ H [0 1] -- [В| [-3 - I] 
_ fa 2]. fo 0]. [1 1], fo 0], [e 3]. [14 6]. 
т о а [0 1| 7|з-1 [в sl 
0 0 0 0 0 
ae H [0 1] + H [0 0] + H [0 1] + H [0 1] + id [0 — 3] 
0 0 о 0] To o] [o о] fo 0]. 
zi: очо | +[б 1 Е e 


14 6 
s= 5145-5 Д, 


This S can be directly verified by taking [X — X][X — X] = [Xa][Xa]' = where 


> «y [2 0 001000-3 0 555 
хх =ха= [| 1-201111 -I META 
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4.6a. The General Complex Matrix-variate Case 


The preceding analysis has its counterpart for the complex case. Let X, = @ ja) 
be a p x д matrix in the complex domain with the x;jq’s being distinct complex scalar 
variables. Consider a simple random sample of size n from this population designated by 
Xi. Let the o-th sample matrix be X». a = l,...,n, the Xy's being iid as X and the 


р x nq sample matrix be denoted by the bold-faced X= [X 1,---, Xn]. Let the sample 
average be denoted by X= (Xj z) Xi j= I 2 Xija, and Xa be the sample deviation 
matrix: с " 

Xa = [X1 — X, ..., X, — X]. 
Let 5 be the sample sum of products matrix, namely, $ = Хх» where an asterisk de- 
notes the complex conjugate transpose and let $ j be the sample sum of products matrix 
corresponding to the j-th column of X. Then we have the following result: 


Theorem 4.6a.3. Let X, X , Ха, Sand $ у be as previously defined. Then, 


н 


= 5 ++ 5 = ХХ. (4.ба.1) 


Example 4.6a.1. Consider a 2 x 2 complex matrix-variate № 2(0, А, В) distribution 


where 
2 2 1+1 ED 
же, 3 | апа ee | 
iid 


A simple random sample of size 4 from this population is available, that is, Xy ~ 
N55(0, A, В), а = 1, 2, 3, 4. The following are one set of observations on these sample 
values: 


NI Р а —i Eo 1 Е  _| 2 3+i 
= [3 | =|; qe "dc hat || 
Determine the observed sample average, the sample matrix, the sample deviation matrix 

and the sample sum of products matrix. 


Solution 4.6a.1. The sample average is 
= T P = » 
X= Да Xo t Xs-F Xa] 


zs asp eb Peas ons 
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and the deviations are as follows: 


AM ЧЕТ WESS ET 
== P, = | =| | 


ME Ее КЕ NO 
Ям = |, д =; | 


The sample deviation matrix is then Xa == [X 14, Xj, X за, Хаа]. If Vxı denotes the first 
column of Хоа, then with our usual notation, 5] = ра 1 Va1 V% and similarly, if V5? is 


the second column of Х,а, then So == m 1 Va2 У, the sample sum of products matrix 
being S=S 1+ So. Let us evaluate these quantities: 


&-[ | [0 -rene[ usu -i-ge[ |i -ael [0 2 4- i] 


= 5+0 9 deb ule sib e] 


NALE | ed ed OPENED 
ер 2 2420] [1 0 5 844i 
= ж 4 e 4 Е БЁР Ful 
_[ 10 12444 
йз 24 | 
and then, 
"m E, 10 — 1244i 12. 114i 
#=й +® 3 Жш 24 ibe 34 | 


This can also be verified directly as 5 = [Xa][Xal* where the deviation matrix is 


tO oie, ee, Gee с 
Shad vals debi- ш ЧИО р 


As expected, 


This completes the calculations. 
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Exercises 4.6 


4.6.1. Let A be a2 x 2 matrix whose first row is (1, 1) and B be 3 x 3 matrix whose first 
row is (1, —1, 1). Select your own real numbers to complete the matrices A and B so that 
A > O and B > O. Then consider a 2 x 3 matrix X having a real matrix-variate Gaussian 
density with the location parameter M — O and the foregoing parameter matrices A and 
B. Let the first row of X be X, and its second row be X». Determine the marginal densities 
of Х| and Хэ, the conditional density of X, given X», the conditional density of X» given 
X1, the conditional expectation of X, given X» = (1, 0, 1) and the conditional expectation 
of X» given X, = (1, 2, 3). 


4.6.2. Consider the matrix X utilized in Exercise 4.6.1. Let its first two columns be Yı 
and its last one be Y2. Then, obtain the marginal densities of Y; and Y2, and the conditional 
densities of Y, given Y? and Y2 given Yj, and evaluate the conditional expectation of Yı 
1 

1 2[ 
4.6.3. Let A > О and B > O be 2 x 2 and 3 x 3 matrices whose first rows are (1, 1 — i) 
and (2, i, 1 + i), respectively. Select your own complex numbers to complete the matrices 
A = A* > O and B = B* > O. Now, consider a 2 x 3 matrix X having a complex 
matrix-variate Gaussian density with the aforementioned matrices A and B as parameter 
matrices. Assume that the location parameter is a null matrix. Letting the row partitioning 
of X , denoted by X 1, Х 2, be as specified in Exercise 4.6.1, answer all the questions posed 
in that exercise. 


given Y, = (1, —1) as well as the conditional expectation of Y given Y; = 


4.6.4. Let A, B and X be as given in Exercise 4.6.3. Consider the column partitioning 
specified in Exercise 4.6.2. Then answer all the questions posed in Exercise 4.6.2. 


4.6.5. Repeat Exercise 4.6.4 with the non-null location parameter 
- 2 1-i i 
= fee 2-1 Е 
4.7. The Singular Matrix-variate Gaussian Distribution 
Consider the moment generating function specified in (4.3.3) for the real case, namely, 


Mx(T) ES My(T) = e (T MO GOAT ET!) (4.7.1) 


where ©; = А! > О and Xz = B^! > О. In the complex case, the moment generating 
function is of the form 


~ ~ бүт TAS 1 T T 
Mg(T) = еМ +00217 XT ) (4.7a.1) 
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The properties of the singular matrix-variate Gaussian distribution can be studied by mak- 
ing use of (4.7.1) and (4.7a.1). Suppose that we restrict 27] and 27 to be positive semi- 
definite matrices, that is, X1 > О and 22 > О. In this case, one can also study many 
properties of the distributions represented by the mgf's given in (4.7.1) and (4.7a.1); how- 
ever, the corresponding densities will not exist unless the matrices Z, and 2» are both 
strictly positive definite. The p x q real or complex matrix-variate density does not ex- 
ist if at least one of A or В is singular. When either or both 27] and 27 are only positive 
semi-definite, the distributions corresponding to the mgf's specified by (4.7.1) and (4.7a.1) 
are respectively referred to as real matrix-variate singular Gaussian and complex matrix- 
variate singular Gaussian. 


For instance, let 


3 = 0 
zel, | and X2 = —1 2 1 
0 1 1 


in ће mgf of a 2 x 3 real matrix-variate Gaussian distribution. Note that X; = X 1 and 
275 = 35. Since the leading minors of X; are |(4)| = 4 > 0 and || = 0 and those 


of 2 аге |(3)| = 3 > 9, | MEE: = 5 > бапа |X| = 2 > 0, 2л is positive 


-1 2 
semi-definite and 2» is positive definite. Accordingly, the resulting Gaussian distribution 
does not possess a density. Fortunately, its distributional properties can nevertheless be 
investigated via its associated moment generating function. 
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Chapter 5 A) 
Matrix-Variate Gamma and Beta Distributions hen for 


5.1. Introduction 


The notations introduced in the preceding chapters will still be followed in this one. 
Lower-case letters such as x, y will be utilized to represent real scalar variables, whether 
mathematical or random. Capital letters such as X, Y will be used to denote vector/matrix 
random or mathematical variables. A tilde placed on top of a letter will indicate that the 
variables are in the complex domain. However, the tilde will be omitted in the case of con- 
stant matrices such as A, B. The determinant of a square matrix A will be denoted as |A| 
or det(A) and, in the complex domain, the absolute value or modulus of the determinant of 
B will be denoted as |det(B)|. Square matrices appearing in this chapter will be assumed 
to be of dimension p x p unless otherwise specified. 


We will first define the real matrix-variate gamma function, gamma integral and 
gamma density, wherefrom their counterparts in the complex domain will be developed. A 
particular case of the real matrix-variate gamma density known as the Wishart density is 
widely utilized in multivariate statistical analysis. Actually, the formulation of this distri- 
bution in 1928 constituted a significant advance in the early days of the discipline. A real 
matrix-variate gamma function, denoted by Г, (о), will be defined in terms of a matrix- 
variate integral over a real positive definite matrix X > О. This integral representation 
of I”, (o) will be explicitly evaluated with the help of the transformation of a real positive 
definite matrix in terms of a lower triangular matrix having positive diagonal elements in 
the form X = TT’ where T = (tij) is a lower triangular matrix with positive diagonal 
elements, that is, 1; = 0, i < j and tj; > 0, j =1,..., р. When the diagonal elements 
are positive, it can be shown that the transformation X = TT’ is unique. Its associated 
Jacobian is provided in Theorem 1.6.7. This result is now restated for ready reference: For 
a p x p real positive definite matrix X = (х;;) > О, 
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P . 
XcTT cs gx o pur "ar (5.1.1) 
j=l 


where T = (tij), tjj = 0, i < jand t;; > 0, j = 1,..., p. Consider the following 
integral representation of Г, (о) where the integral is over a real positive definite matrix X 
and the integrand is a real-valued scalar function of X: 


Tp(@) = / Х| F eax. (5.1.2) 
X>0 


Under the transformation in (5.1.1), 


IX- ^ dX = Tei ort [ety ar 


Ј=1 
= 2 [ey dT 
j=l 


Observe that tr(X) = tr(T T^) = the sum of the squares of all the elements in Т, which is 


= 
jai f + 27i. fij. By letting t}; = y; 2 dtjj = 5yj 


1 | 
‚ j ауу, noting that tj; > 0, the 
integral over fj; givés 


d deg mp. j=l j I 
jj (17) 2e 9 у = Г(а — ——), Ræ- ——)>0, у=1,...,р, 
0 


pe 1 


the final condition being R(a) > . Thus, we have the gamma product Г (а) Г (о — 


3) Ta E Now for i > j, the integral over f;; gives 


П/ = Плн, 


ij iJ 


Therefore 


(p— 
Го) = m 4 


1 1 —1 
(= Te- 2——). Ro) > 7. 
1 
=f IX F e-tOax, (а) > 2. (5.1.3) 
X>0 2 
For example, 


1 1 
Pa) =r" rT (a — J= = лі Г(о)Г(о E Ra) > =. 
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This Г, (о) is known by different names in the literature. The first author calls it the real 
matrix-variate gamma function because of its association with a real matrix-variate gamma 
integral. 


5.1a. The Complex Matrix-variate Gamma 


In the complex case, consider a p x p Hermitian positive definite matrix X = X* > O, 
where X* denotes the conjugate transpose of X.LetT — (fj j) be a lower triangular matrix 
with the diagonal elements being real and positive. In this case, it can be shown that the 
transformation X = T T* is one-to-one. Then, as stated in Theorem 1.6a.7, the Jacobian is 


113" ое а (5.1а.1) 


With the help of (5.1а.1), we can evaluate the following integral over р х р Hermitian 
positive definite matrices where the integrand is a real-valued scalar function of X. We 
will denote the integral by Г, (о), that is, 


Г, (о) = | \det (X)|°-Pe "aX, (5.1a.2) 
X>O 


Let us evaluate the integral in (5.1a.2) by making use of (5.1a.1). Parallel to the real case, 
we have 


P p | 
Idet |" "aX = { Пере?" П Е 


р 
-ul 202, 7/*3]Jaf. 
As well, 
e 10D — з з 1 г = Уз KP 


Since tj; is real and positive, the integral over t;; gives the following: 
ur E А " А ; 
Ji (y 77e Лйу =T (œ -— (j -— 1), Ræ- (j -1))>0, j=1,...,p 


the final condition being (0) > р — 1. Note that the absolute value of f;;, namely, || 
is such that |/;;|? = = ty + £2» where fj; = tiji + itjj2 with tiji, tjj? real and i = /(—1). 
Thus, | 


2 2 a 1) 
[Tf [s tma deve = D] = 


ij i-j 
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Then 


a i NE -1 
(a) = | |det(X)|%~Pe~ "dX, 9 (о) > E (5.1a.3) 
X>O 


р(р—1) 


=л 2? I'(a)I(a—1)---r(a—(p-1)), (о) > р- 1. 
We will refer to Г (a) as the complex matrix-variate gamma because of its association 
with a complex matrix-variate gamma integral. As an example, consider 


Q 


D(a) = a? Pila = 1) = лГ(о)Г(а — 1), Ræ) > 1. 


5.2. Тһе Real Matrix-variate Gamma Density 


In view of (5.1.3), we can define a real matrix-variate gamma density with shape pa- 
rameter o as follows, where X is p x p real positive definite matrix: 


p4-1 25 
tae ак X > О, Ræ) > £7 


0, elsewhere. 


ЛОХ) = | (5.2.1) 


Example 5.2.1. Let 


хари е с „гл = 
Х12 X22 X2 — ly2 X3 


where x11, X12, Xo2, X1, X2, yo, хз are all real scalar variables, i = 4/(—1), x22 > 
0, x11X22 — хо > 0. While these are the conditions for the positive definiteness of the real 
matrix X, x; > 0, x3 > 0, x1x4 — (xs + ys) > 0 are the conditions for the Hermitian 
positive definiteness of X. Let us evaluate the following integrals, subject to the previously 
specified conditions on the elements of the matrix: 


(1): 8; = / e Futx22) dys) ^ dx12 A dx?» 
X>O 


(2): 82 = / e Cry, A d(x2 + iy2) ^ dx3 
X>O 


(3): ó4— / Хе rt» dx ^ dx12 ^ dx22 
X>O 


(4): 84 = | |det(X)|2e- 1+3) dx, A а(хә + уо) ^ dx3. 
X>O 
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Solution 5.2.1. (1): Observe that бү can be evaluated by treating the integral as a real 
matrix-variate integral, namely, 


3 
81 - | Х| F e "Ody with p = 2, =, «a = >, 
X-0 2 2 
and hence the integral is 
2) 1/2 
3/2) = л + Г(3/2)Г(1) = x !(1/2)Г(1/2) = = 


This result can also be obtained Бу direct integration as a multiple integral. In this case, 
the integration has to be done under the conditions х > 0, x22 > 0, x11x22 — Ж» > 
0, that is, Ж» < Х11Х22 OF —,/X11X22 < X12 < ./X11X22. The integral over х2 yields 
poo ахо = 2,/x11 X7, that over ху then gives 


ze A/X11322 


oo oo 3 1 1 
2 | xe “!!йх11 = jj xi е dx1ı = 2r (3/2) = 72, 
x11—0 0 


and on integrating with respect to x22, we have 


"m 


[ a/ X22€ C» 4x5, = 
so that бу = NENET =. 


(2): On observing that 52 can be viewed as a complex matrix-variate integral, it is seen that 
8 = | \det(X)|22e-" аў = D) =r * rQ)r()- x. 
X>O 


This answer can also be obtained by evaluating the multiple integral. Since X > O, we 


(х (9+9) 
X3 


have x; > 0, x3 > 0, xix3 — (x2 + y2) > 0, that is, x1 > . Integrating first with 


G2») 
X3 


А En Lebe 
e™d x] = e adya ei ae 
(2-52 2, © 
а у=0 


Now, the integrals over x? and у» give 


respect to x; and letting y = x1 — , we have 


oo 


o 0x oo E 
J e зах = va | ed, = /x3/n and J e ?dy = xa m, 
—OO0 


—oo —oo 
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that with respect to x3 then yielding 
оо 
J xse Захз = r (2) = 1, 
0 


so that бо = (1). /z ут = п. 


(3): Observe that 53 can be evaluated as a real matrix-variate integral. Then 


1 3 
84 = / Хе "Dax = / \х|}—%е-ЧООах, with PEE гү р=2 
X-0 X20 2 2 


= 14(5/2) = nt Г(5/2)Г 4/2) = 12(3/2)0 /2)1! 2 (1) 


3 
= -Лл. 
4 


Let us proceed by direct integration: 


63 = / [х11Х22 — х2 je 1+2) х ^ dx12 A dx22 
X>O 


2 

X = 

= хәз] хи! = di. Gurtx92 dy ^ dx12 ^ хээ; 
X>O X22 


2 
letting y = x11 — d the integral over хі yields 


Now, the integral over x12 gives J/x224/7t and finally, that over x22 yields 
A uc 3 
J хе dan = Г(5/2) = G/D/2) s = FH. 
x22>0 
(4): Noting that we can treat д4 as a complex matrix-variate integral, we have 
ee | Idet(X) e aX = / \det(X)|4-2e- "Dax = fs(4), a = 4, р=2, 
X-0 X20 


= 7? rajr)-az390h = 12x. 


Direct evaluation will be challenging in this case as the integrand involves |det(X Ne 
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If a scale parameter matrix B > O is to be introduced in (5.2.1), then consider 
tr(BX) = tr(B2X В?) where В? is the positive definite square root of the real posi- 
tive definite constant matrix B. On applying the transformation Y = BiXBi > dX = 
|B|- ДУ. as stated in Theorem 1.6.5, we have 


1 1 
/ |6 F ет (8х - / Х| 2 e? X B2) dX 
X20 X>O 
р+1 
E в“ f |y [47 ^2 e "May 
Y>O 


= |B| ^T. (5.2.2) 


This equality brings about two results. First, the following identity which will turn out to 
be very handy in many of the computations: 
Be = — к 
Гу (о) X>0 


E 
etBXay B > О, Ræ) > == (5.2.3) 


As well, the following two-parameter real matrix-variate gamma density with shape ра- 
rameter o and scale parameter matrix B > О can be constructed from (5.2.2): 


В| уо 221, (ВХ) А р-1 
F(X) = туа)! 2e QX-0,B-0O,39(a) 5 (5.2.4) 
0, elsewhere. 


5.2.1. The mgf of the real matrix-variate gamma distribution 


Let us determine the mgf associated with the density given in (5.2.4), that is, the two- 
parameter real matrix-variate gamma density. Observing that X = X’, let Т be a symmet- 
ric p x p real positive definite parameter matrix. Then, noting that 


p 

iue уху +2) eum (i) 
j=l i>j 

it is seen that the non-diagonal elements in X multiplied by the corresponding parame- 


ters will have twice the weight of the diagonal elements multiplied by the corresponding 
parameters. For instance, consider the 2 x 2 case: 


ü ty f| Xin x12 ae 111Х11 + fi2x12 Q1 
fi2 02| |х12 X22 a2 112Х12 + 122Х22 


= йіх + 212512 + 12x22 (ii) 
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where o and o» represent elements that are not involved in the evaluation of the trace. 
Note that due to the symmetry of Т and X, t1 = tj? and x21 = x12, so that the cross 
product term f12x12 in (ii) appears twice whereas each of the terms 11x11 and t22x2» appear 
only once. 

However, in order to be consistent with the mgf in a real multivariate case, each 
variable need only be multiplied once by the corresponding parameter, the mgf be- 
ing then obtained by taking the expected value of the resulting exponential sum. Ac- 
cordingly, the parameter matrix has to be modified as follows: let 4T = (tjj) where 
«jj = tjj, «lij = ij. i Æ j, and fjj = tj; for all i and j or, in other words, the 
non-diagonal elements of the symmetric matrix T are weighted by І, such a matrix being 
denoted as „Т. Then, 


tr(, T X) = У ар 
i,j 


and the mgf in the real matrix-variate two-parameter gamma density, denoted by M x (T), 
is the following: 


MxGT) = E[e™?®] 


I5(a) Jx>0 


Now, since 
1 1 
tr(BX —,TX) = tr((B —,T)2X(B — „Т)?) 
for (B — ,T) > О, that is, (B — ,T)? > О, which means that Y = (B — ,T)? X(B — 
1 
T) => dX = |в — T) Cay, we have 
|В|° 
Гъ (=) X>O 
В|“ p 
Г,(а) Y>0 
|B\°|B — T|” 


= |I — B7! T|™® for I — В-\„Т > О. (5.2.5) 


MxGT) = је F et TX x 


When „Т is replaced by —.T, (5.2.5) gives the Laplace transform of the two-parameter 
gamma density in the real matrix-variate case as specified by (5.2.4), which is denoted by 
L t T), that is, 


Lf&T) = My(—«T) = |I + B ,T|* for I -- B .,T > О. (5.2.6) 
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For example, if 


xeu xm B= 2 —1 ang fes xpo x12 
X12 X22 -1 3 112 t22 
then |B| = 5 and 
2-41 —1— | 
=l — t2 S» 


= 5 (2 — 4t11)(3 — 4122) — (1+). 


MxGT) = |B|*|B —.T| = 5" 


If „Т is partitioned into sub-matrices and X is partitioned accordingly as 


T T X X 
pallu» | and X= | 11 d i 
* s «T 22 Xn X22 m 


where „Т and Ху arer xr, r < p, then what can be said about the densities of the 
diagonal blocks Ху and X22? The mgf of X1, is available from the definition by letting 
«T4? = О, „Тоу = O and Т» = О, as then E[e 770] = E[ eGTuXi)], However, 
B~',T is not positive definite since B^!,T is not symmetric, and thereby J — B^!,T 
cannot be positive definite when „Ту = О, „То = О, „Гэх = О. Consequently, the 
mgf of Ху cannot be determined from (5.2.6). As an alternative, we could rewrite (5.2.6) 
in the symmetric format and then try to evaluate the density of X11. As it turns out, the 
densities of Ху and X»» can be readily obtained from the mgf in two situations: either 
when B = I or B is a block diagonal matrix, that is, 


2j 
в |51 ИЕ Е о. (iv) 


Hence we have the following results: 


Theorem 5.2.1. Let the p x p matrices X > О and „Т > О be partitioned as in (iii). 
Let X have a р х р real matrix-variate gamma density with shape parameter a and scale 
parameter matrix lp. Then Ху has anr x г real matrix-variate gamma density and X22 
has a (p — r) x (p — r) real matrix-variate gamma density with shape parameter a and 
scale parameters І, and Ip—r, respectively. 


Theorem 5.2.2. Let X be partitioned as in (iii). Let the p x p real positive definite 
parameter matrix В > О be partitioned as in (iv). Then Ху has an т x r real matrix- 
variate gamma density with the parameters (a and Ву > О) and X22 has a (p — r) x 
(p — r) real matrix-variate gamma density with the parameters (о and B5» > О). 
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Theorem 5.2.3. Let X be partitioned as in (iii). Then Ху and Xn are statistically inde- 
pendently distributed under the restrictions specified in Theorems 5.2.1 and 5.2.2. 


In the general case of B, write the mgf as Mx(.T) = |B|"|B — .T| ?, which corre- 
«Tiu | 


sponds to a symmetric format. Then, when „Т = о ol 


By — Тур Bio) 
B» B»? 


= |BI*|B»| *|Bii — Ти — ВоВ»! Вәу| © 
= |Bi|"|Bii — Bi2By Bail" | Bo| (Ви — Bi2Byy' B21) — «Tu 
= |Bii — BiB5j Bail" |(Bii — В В»! B21) — «Tul, 


MxGT) = |B|” 


which is obtained by making use of the representations of ће determinant of a partitioned 
matrix, which are available from Sect. 1.3. Now, on comparing the last line with the first 
опе, it is seen that X4; has a real matrix-variate gamma distribution with shape parameter 
а and scale parameter matrix B11 — Bio B5, B21. Hence, the following result: 


Theorem 5.2.4. Ifthe рх p real positive definite matrix has a real matrix-variate gamma 
density with the shape parameter a and scale parameter matrix B and if X and B are 
partitioned as in (iii), then Ху has a real matrix-variate gamma density with shape pa- 
rameter a and scale parameter matrix Ву — Bi B5, By, and the sub-matrix X5» has a 
real matrix-variate gamma density with shape parameter о and scale parameter matrix 
Bz — B21 Bj Bio. 


5.2a. The Matrix-variate Gamma Function and Density, Complex Case 


Let X = X* > O bea p x p Hermitian positive definite matrix. When X is Her- 
mitian, all its diagonal elements are real and hence tr(X ) is real. Let det(X ) denote the 
determinant and |det(X )| denote the absolute value of the determinant of X. As a result, 
|det(X yp ето is a real-valued scalar function of X. Let us consider the following 
integral, denoted by Г p (a): 


Г, (о) = | |det(X)|*7? e^ Oa x, (5.2a.1) 
X>O 


which was evaluated in Sect. 5.1a. In fact, (5.1a.3) provides two representations of the 
complex matrix-variate gamma function Г, (o). With the help of (5.1a.3), we can define 
the complex p x p matrix-variate gamma density as follows: 
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&db pA det GDI Pe "00, Хь 0, 9а) > p-1 5535) 
| = ‚2а. 
0, elsewhere. 


For example, let us examine the 2 x 2 complex matrix-variate case. Let X be a matrix in 


the complex domain, X denoting its complex conjugate and X*, its conjugate transpose. 
When X — X*, the matrix is Hermitian and its diagonal elements are real. In the 2 x 2 
Hermitian case, let 


&-[ m dE и ЕЕ 
X2 — 1у2 X3 x2 + 1y2 X3 


Then, the determinants are 


det(X) = xix3 — (x2 — iyz) Ga + iy2) = xix — (02 + уз) 
= det(X*), ху > 0, хз > 0, x1x3 — (x2 + ys) > 0, 


due to Hermitian positive definiteness of X. As well, 
|det(X)| = +[(det(X) (det(X*))]2 = xixs — (х2 + y2 > 0. 


Note that tr(X) = ху + хз and Бо) = x * Г(о)Г(о — 1), (о) > 1, р = 2. The 
density is then of the following form: 


m 1 x ~ 
(X) = —— |det(X)| 2е "00 
f Г, (о) 

1 


= a—2 7 (1+3) 
л Г(о)Г (o — 1) 


[x1x3 — (x2 + y2)] 


for x; > 0, x3 > 0, x1x4 — (x2 + ys) > 0, 3i(a) > 1, and fi(X) = 0 elsewhere. 


Now, consider a p x p parameter matrix B > O. We can obtain the following identity 
corresponding to the identity in the real case: 


z 1 x re 
|det(B)|-* = = / |det(X)|*- Ре "FAX, (о) > p—1. (5.2a.3) 
pla X-0 

A two-parameter gamma density in the complex domain can then be derived by proceeding 


as in the real case; it is given by 


be de BI" | get X a-pe- (BX) BS б, X > О, о) > р-1 
A = | Ме etel ! | ECL os 
0, elsewhere. 
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5.2a.1. The mgf of the complex matrix-variate gamma distribution 


The moment generating function in the complex domain is slightly different from that 
in the real case. Let T > O bea р X р parameter matrix and let X be p x p two- 
parameter gamma distributed as in (5.2a.4). Then T= T, -- iT» and X=X 1 iX, with 
Ti, T», Xi, Xz real andi = /(—I). When T and X are Hermitian positive definite, Т! 
and X, аге real symmetric and T and X» are real skew symmetric. Then consider 


tr(T* X) = tr(T X1) + tr(T2 X5) + i[tr(Ti X2) — tr(T5X4)]. 


Note that tr(Tj X1) + tr(7» X2) contains all the real variables involved multiplied by the 
corresponding parameters, where the diagonal elements appear once and the off-diagonal 
elements each appear twice. Thus, as in the real case, Т has to be replaced by „Т = „Т + 
і. Тә. A term containing i still remains; however, as a result of the following properties, 
this term will disappear. 


Lemma 5.2a.1. Let T, Х, Ti, To, Ху, Xo be as defined above. Then, tr(T| X2) = 
0, tr(T; X1) = 0, trG, Ty X5) = 0, tr(, T2 X1) = 0. 


Proof: For any real square matrix A, tr(A) = tr(A’) and for any two matrices A and B 
where AB and ВА are defined, tr(AB) = tr(B A). With the help of these two results, we 
have the following: 


tr(T, X2) = tr(T| X5) = tr(X5T]) = —tr(X5T|]) = —tr(Ti X2) 


as Tj is symmetric and X» is skew symmetric. Now, tr(7] X2) = (Т X2) > tr(Tj X2) = 
O since it is a real quantity. It can be similarly established that the other results stated in 
the lemma hold. 


We may now define the теѓ in the complex case, denoted by М; (T), as follows: 


Myf) = geret |. e 7*5 F(X) dX 
X20 


сЕ / уд 
Ѓ,(о) /й>о 


Since tr(X(B — „Т уу = tr(CXC*) for C = (B — T )3 and C > О, it follows from 
Theorem 1:6a.5 that Y = CXC* = dř = |det(CC*)|? dX, that is, dX = |det(B — 


«T )|-? dY for B Т > O. Then, 
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» det(B)|* А " н а 
M&T) = as det(B — y | |det(Y)|* "e "ODqy 
p (a Y>o 
= | det(B)|*|det(B —.T )|~* ВТ >O 
ГВ i. I- BIT“ > 0O. (5.2a.5) 


For example, let p = 2 and 


~ a * 3 4 t f 
X= Хи "2| в miT =n An, 
Xj, X22 —i 2 ali. xf22 
with Xo; = xj, and slo] = at D In this case, the conjugate transpose is only the conjugate 
since the quantities are scalar. Note that В = B* and hence B is Hermitian. The leading 


minors of В being |(3)| = 3 > 0 and |B| = (3)(2) — (—i)(i)) = 5 > 0, B is Hermitian 
positive definite. Accordingly, 


Mg GT) = |det(B)|*|det(B — ,T^)| ^ 
= 5"[G — ай) — 0) + (i + d) — «112)]. 


Now, consider the partitioning of the following p x p matrices: 
pan BÉ is]. T- | Tu Pd ДИЙ = B al () 
Xn X22 Tj «Tx B» B 


where X 11 and NT are r x r, r < p. Then, proceeding as in the real case, we have the 
following results: 


Theorem 5.2a.1. Let X have a p x p complex matrix-variate gamma density with shape 
parameter а and scale parameter Ip, and X be partitioned as in (i). Then, Xj, has an 
r xr complex matrix-variate gamma density with shape parameter a and scale parameter 
I, and Хээ has a ( p—r) x (p —r) complex matrix-variate gamma density with shape 
parameter а and scale parameter Ip—r. 


Theorem 5.2a.2. Let the p x p complex matrix X havea p x p complex matrix-variate 
gamma density with the parameters (a, B> O) and let X and B be partitioned as in (i) 
and Bo = = 0, By = = 0. Then Xu and Xn haver xr and (p = ғ) x (p-r) complex 
matrix-variate gamma densities with shape parameter о and scale parameters By, and 
Boo, respectively. 


Theorem 5.2a.3. Let X, X 11, X» and B be as specified in Theorems 5.2a.1 or 5.2a.2. 
Then, Хі and Xn are statistically independently distributed as complex matrix-variate 
gamma random variables on r x r and (p — r) x (p — r) matrices, respectively. 
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For a general matrix B where the sub-matrices B12 and B», are not assumed to be null, 
the marginal densities of X11 and X22 being given in the next result can be determined by 
proceeding as in the real case. 


Theorem 5.2a.4. Let X have a complex matrix-variate gamma density with shape pa- 
rameter о and scale parameter matrix B = B* > О. Letting X and B be partitioned as 
in (i), then the sub-matrix X1 has a complex matrix-variate gamma density with shape 
parameter а and scale parameter matrix Bi] — Bi Ву)! B», and the sub-matrix Xn has 
a complex matrix-variate gamma density with shape parameter a and scale parameter 
matrix B5» — By Bj By. 


Exercises 5.2 


5.2.1. Show that 


(p-r(p-r-l1) рр 2r(p-r) p(p-1) 
л 4 л ёл 4 LE cu a 


5.2.2. Show that I (à) D, (a — 5) = Г (а). 
5.2.3. Evaluate (1): fy. 9 e "dX, (2): fy. о X e ах. 
5.2.4. Write down (1): Г (0), (2): T4(œ) explicitly in the real and complex cases. 


5.2.5. Evaluate the integrals in Exercise 5.2.3 for the complex case. In (2) replace det(X) 
by |det(X)]. 


5.3. Matrix-variate Type-1 Beta and Type-2 Beta Densities, Real Case 


The p x p matrix-variate beta function denoted by В (о, В) is defined as follows in 
the real case: 


— Dy (0D) (В) p-1 
By(o, B) = Ty d B) (a) > E (8) > Y | (5.3.1) 


This function i the following integral representations in the real case where it is assumed 
that R(w) > 25 апа (B) > 25+: 
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В (о, В) =f Т xp — xf- A ax, a type-1 beta integral (5.3.2) 
<X< 

Bp(p, о) -Í - y|£- ^ —Yyp- "T dy, a type-1 beta integral (5.3.3) 
<Y< 

В.(о, В) = [ : |Z a PT + Z| €**PgZ. a type-2 beta integral (5.3.4) 
> 

В,(В, a) = | А ITIP- |1 + т|-©@+®ат, а type-2 beta integral. (5.3.5) 
> 


For example, for р = 2, let 


Ё 3 | = |X| 11522 12 | | ( 11)( 22) 12 


Then for example, (5.3.2) will be of the following form: 


—prl = 
Bye f) = | Х| 2411 xj? тах 
X>O 


=a 
zi | | [x11X22 = 2 
x11>0 Jx22>0 x11322—3x25 7.0 


3 
х — xi) (1 — x22) — х0] ахи A dx12 A хоо. 


We will derive two of the integrals (5.3.2)-(5.3.5), the other ones being then directly 
obtained. Let us кыш with the integral representations of Г, (о) and Г,(В) for (0) > 


BL, (B) 251: 


Г›(о)Г,(В) = if ix i-e —tr(X) ax] EJ : ja Qd 


=f [ne T |y [£7 Pee tay л gy, 


Making the transformation U = X + Y, X = V, whose Jacobian is 1, taking out U from 
n+l 

IU — V| = |U| |I —U-2VU~3|, and then letting W = U-2VU-? > dV = |U| аи 

we have 
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reor, = | / пие F u — vibe "Oa дау 
U JV 


= {| ш FF etary 
U>O 


q—Ptl Ган 
х | IWI- pr — W] raw] 
O «W «I 


а etl gs 
= Г,(0 + В) IWE I — W- aw. 
O«W «I 


Thus, on dividing both sides by Г, (о, В), we have 
р+1 р+1 4 
В (а, В) x IW z | 2 dw. (0) 
OQ «W «I 
This establishes (5.3.2). The initial conditions (0) > = R(B) > BE are sufficient 


to justify all the steps above, and hence no conditions are listed at each stage. Now, take 
Y — I—W toobtain (5.3.3). Let us take the W of (i) above and consider the transformation 


у Гу о р у) 
which gives 
11 = W7! — I |7| taz = |wl|-? Paw. (ii) 
Taking determinants and substituting in (ii) we have 
dW = |I + Z|-?tdz. 


On expressing W, J — W and dW in terms of Z, we have the result (5.3.4). Now, let Т = 
Z-! with the Jacobian dT = |Z|- (^ * P dZ, then (5.3.4) transforms into the integral (5.3.5). 
These establish all four integral representations of the real matrix-variate beta function. We 
may also observe that В, (o, B) = В,(В, a) or a and f can be interchanged in the beta 
function. Consider the function 


Г, А 
ВО) = ETP ураар x (5.3.6) 
I, (a) Г,(8) 
for О < X < I, R(a) > = 98(8) > 227 апа 33 (Х) = 0 elsewhere. This is a type-1 


real matrix-variate beta density with the parameters (o, 6), where О < X < I means 
X > О, I—X > О so that all the eigenvalues of X are in the open interval (0, 1). As for 
I(a + B) izie- 
Dy (a) Ip (B) 
whenever Z > О, St(a) > n R(B) > = and f4(Z) = 0 elsewhere, this is a p x p 
real matrix-variate type-2 beta density with the parameters (o, f). 


fa(Z) = raze (5.3.7) 
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5.3.1. Some properties of real matrix-variate type-1 and type-2 beta densities 


In the course of the above derivations, it was shown that the following results hold. If 
X is a p x p real positive definite matrix having a real matrix-variate type-1 beta density 
with the parameters (o, 8), then: 


(1): Yı = I — X is real type-1 beta distributed with the parameters (В, о); 


(2): Yo = (I — X)-2X (I — X y? is real type-2 beta distributed with the parameters 
(a, B) 


(3): Ya = (1— X) 2X7! (1— X) is real type-2 beta distributed with the parameters (6, о). 
If Y is real type-2 beta distributed with the parameters (о, В) then: 


(4): 21 = Y~! is real type-2 beta distributed with the parameters (В, о); 
(5): Z2 = (1+ ү)-? YU+ y)? is real type-1 beta distributed with the parameters (a, 6); 


(6): 43=1- (1+ Y) Y + ү)-? = (I + У)! is real type-1 beta distributed with the 
parameters (В, о). 


5.3a. Matrix-variate Туре-1 and Туре-2 Beta Densities, Complex Case 


A matrix-variate beta function in the complex domain is defined as 


(a) P, (B) 


- , R —1, Ў — 1 5.3a.1 
f + B) (a) > р (p) — p (5.3a.1) 


Bp (а, В) = 


with а tilde over В. As В р(9, В) = В p(B, а), clearly а and В can be interchanged. Then, 
B p(a@, B) has the following integral representations, where (0) > р – 1, (8) > p—1: 


B,(a, B) =} _ Idet(X)|*77|det(J — X)|P-? dX, а (уре-1 beta integral (5.3а.2) 
O<X<I 

B,(B, œ) = | |det(Y)|P-P|det(] — Y)|*-?dY, атуре-1 beta integral (5.3а.3) 
О<ў<1 

B,(a, В) = [ |det(Z)|*~? |\det(I + Z)|- **PdZ, а type-2 beta integral (5.3а.4) 
Z>O 


В,(В, а) = [ ае) |det(I + T)|- **P dT, а type-2 beta integral. — (5.3a.5) 
T>O 
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For instance, consider the integrand in (5.3a.2) for the case p = 2. Let 


&-[ m dw Е dec. 


X2 — ly2 Хз 


the diagonal elements of X veng real; since X is Hermitian postive definite, we have 
xı > 0, x3 > 0, xix2— (15 ty) > 0, det(X) = = xix3 — (х2 +2) > O and det(/ — Х) = 
(1 — x1)(1 — x3) — (xs + ys) > 0. The integrand in (5.3a.2) is then 


3 


[xixs — (x3 БУ ИЕТ ШШ — x) — лз) — @ + IP? 
The derivations of (5.3a.2)-(5.3a.5) being parallel to those provided in the real case, they 
are omitted. We will list one case for each of a type-1 and a type-2 beta density in the 
complex p x p matrix-variate case: 
Ñ, (a + В) 


ВОХ) = —L—  — |det(X)|*7?| det(I — X)|P-? (5.3а.6) 
ВЕ EU 


forO < X < I, Ræ) > p— 1, (8) > р — 1 and B(X) = 0 elsewhere; 


_ Г.о + p) 


Fi а en det(I + Z)| et?) (5.3a.7) 
I, (a@) Ip (B) 


for Z > О, (о) > p—1, R(B) > р — Тапа fc — 0 elsewhere. 
Properties parallel to (1) to (6) which are listed in Sect. 5.3.1 also hold in the complex 
case. 


5.3.2. Explicit evaluation of type-1 matrix-variate beta integrals, real case 


A detailed evaluation of a type-1 matrix-variate beta integral as a multiple integral is 
presented in this section as the steps will prove useful in connection with other compu- 
tations; the reader may also refer to Mathai (2014,b). The real matrix-variate type-1 beta 
function which is denoted by 

Ty (@) Lp (B) 


р—1 
_ 1р\®){р 
В.(о, B) = Га +8)” , R) > 2 5 , RB) > 2 "E 


has the following type-1 beta integral representation: 


Ba p= f рүе 27 = ax, 
О<Х<1 
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for (a) > = R(B) > pot where X is a real р х р symmetric positive def- 


inite matrix. The standard derivation of this integral relies on the properties of real 
matrix-variate gamma integrals after making suitable transformations, as was previously 
done. It is also possible to evaluate the integral directly and show that it is equal to 
Г}(0)Г,(08)/ Гь (о + B) where, for example, 


р(р 


= Го) Го — 1/2) --- (a — (p — 0/2), Ra) > d 


Гь) = л 


A convenient technique for evaluating a real matrix-variate gamma integral consists of 
making the transformation X = TT’ where T is a lower triangular matrix whose diagonal 
elements are positive. However, on applying this transformation, the type-1 beta integral 


does not simplify due to the presence of the factor |] — X| im: Hence, we will attempt 
to evaluate this integral by appropriately partitioning the matrices and then, successively 
integrating out the variables. Letting X = (х;;) be a p x p real matrix, xp, can then be 
extracted from the determinants of | X| and |/ — X| after partitioning the matrices. Thus, 


let 
Хүр Хә 
Х = 
Е | 


where X11 is the (p — 1) x (p — 1) leading sub-matrix, X21 is 1 x (p — 1), Xo» = Xpp 
and X12 = Хо. Then |X | = IX rillxpp = Xa Xj X12] so that 


— Pr — PH =] = Bal . 
IX ? = |х ? [xpp ХХ Хо] 2, (0) 


апа 
_ РЇ . p+l = pti КА 
axe Slax? у а) 'Xpy 3. Oe 


It follows from (i) that xpp > ХХХ and, from (ii) that xpp < 1 — Xa (l — 
X11)! X12; thus, we have ХХ Xi < Xpp < ]— X4üu — X1)! Xp. Let y= 
Xpp — Xa Xj Xi = dy = dx,, for fixed X21, X11, so that 0 < y < b where 


Ь=1- ХХХ – Xa - Xu) Xv 
zd 1 1-1 
=1— Xa X (1— Xu) 2d — Xi) ?Хү X12 


c 
-21- WW', W = Xa X (1— Xu) 
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The second factor on the right-hand side of (ii) then becomes 


pti pti 


[b — yo? = p/- T — y/o 


T +1 
Now ae = = + for fixed b, the terms containing и and b become + (+1051 ye 


(1— ub- ^x Md over u then gives 


1 =, = — В=1 
0 


Г(о +В — (p — 1) 


1 
for (о) > 251, R) > 25+. Letting W = XX) (1— X11)7? for fixed Ху, 
ах = 1111211 — Хуа from Theorem 1.6.1 of Chap. 1 or Theorem 1.18 of Mathai 
(1997), where X11 is a (p — 1) x (p — 1) matrix. Now, letting v = WW’ and integrating 
out over the Stiefel manifold by applying Theorem 4.2.3 of Chap. 4 or Theorem 2.16 and 
Remark 2.13 of Mathai (1997), we have 


—1 
Pe ) 


Me 


Thus, the integral over b becomes 


1 — 
J b*tB-PdXy, = | v -1(1 — ь)®+#—Рад 
0 


rear —(р—1 
= ore Дн, d DAE атр. 
Loria) 


Then, on multiplying all the factors together, we have 


ж m 


xL F — yO pti 
Г(а + B— 25- 


whenever (о) > m RB) > = In this case, x represents the (p — 1) x (p — 1) 
leading sub-matrix at the end of the first set of operations. At the end of a m set of 
operations, we will denote the (p — 2) x (p — 2) leading sub-matrix by X T , and so on. 


The second step of the operations begins by extracting xp—1,p—1 and writing 


1 2 2 2 2 
\хүү = IX Сх, рл — РГ? 
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where x is a 1 x (p — 2) vector. We then proceed as in the first sequence of steps to 
obtain the final factors in the following form: 


z2 D(a — PS) (Bb — 5) 


2 —pH 2 —ptl 
poop 3 [1 _ X pam 2T 3 
ra +p- 5) 


for R(a@) > pe RB) > Р? Proceeding in such a manner, in the end, the exponent of 
2 2 ё р 


z will be 


pal p-2 1 р(р-1) 
Fag tT es 


and the gamma product will be 
Га — 255) Г (а – 252)... (or (8 – £55) Г(В) 
Го + В — 251) :-: Гоа + В) | 


: (p-1) а 
These gamma products, along with ma , can be written as UN = By(a, В); 


hence the result. It is thus possible to obtain the beta function in the real matrix-variate 
case by direct evaluation of a type-1 real matrix-variate beta integral. 

A similar approach can yield the real matrix-variate beta function from a type-2 real 
matrix-variate beta integral of the form 


| IXe- 7 ur + xp Pax 
X>O 


where X is a p x p positive definite symmetric matrix and it is assumed that i(@) > pot 


апа #(B) > а. the evaluation procedure being parallel. 


Example 5.3.1. By direct evaluation as a multiple integral, show that 


Ty (@)Ip(B) 


1 
/ К-и Т акы 
х>0 Грба + В) 


for р = 2. 


Solution 5.3.1. Тһе integral to be evaluated will be denoted by ô. Let 


=f Ху? 
|X| = х11[хә2 — Ххә1Х|ү x12] = ПЕ Е "d 
11 


2 
X 
|Z — X| = [1 хи — x22) — х12(1 — xi) хр] = (0 — xi) 1 — x22 — ——2— |. 
l — x11 


(0 
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It is seen from (i) that 
2 2 


x x 
212 < хоз < ]1— a 
X11 — Хи 
Letti *12 so that 0 b, andb —1— 32 — = хі 
== — — < = == — — — = = — 
etting y = x22 i so that O < y < b, an m ian ues We 


have 
_3 3 
Ix p-3y — XPRaX = x) 201 — хи) 259—3 
x (b— у)ё- аху ^ dx22 ^ dy. 


Now, integrating out y, we have 


b 1 
J y*-1(b – у)#-%ду = кын v3 0) dv, v = ; 
=0 


0 
уна Г ГВ 3) T 
= Г(« + 8—1) 


Х12 
[х(1—х)17” 
ахі = [х11(1— xi dw for fixed x11. The exponents of ху and (1 — x11) then become 
а — 3 + 1 and в — 3 + І, апа the integral over w gives the following: 


J (1 E w?^)**8-2gw = : | (1 = ш2)%+й 24у = | 2 E z)**8-2gz 
= 0 0 


_ ГОГ@+В—1) 


whenever (0) > i and (8) > І, р being as previously defined. Letting w = 


(iii) 
Pa+ 8—3) 
Now, integrating out x11, we obtain 
1 
dc 2 Г(а)Г(В) 
| Хүү L — ху)ё-1аху = LOLU (iv) 
0 Г (о + p) 
Then, on collecting the factors from (i) to (iv), we have 
l'(o)F (a — 5)Г(8)Г(8— 
$— га) (о)Г (a — 5) F (L) А 2) 
Г(о + B)I'(a + B — 5) 
e 1 1 
Finally, noting that for p = 2, л A AE л? =* m > the desired result is obtained, that 
2 
is, Е 
Го(а) (В) 
ô = е B»(a, В). 


Ба +В) 
This completes ће computations. 
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5.3а.1. Evaluation of matrix-variate type-1 beta integrals, complex case 


The integral representation for В (о, В) in the complex case is 


| .. |det(X)|*~? аек — X) "aX = В, (о, В) 
О<Х<1 


whenever (0) > p — 1, W(8) > p — 1 where det(-) denotes the determinant ‘a (-) and 
|det(-)|, the absolute value (or modulus) of the determinant of (-). In this case, X = (X;;) 
is a p x p Hermitian positive definite matrix and accordingly, all of its 1 elements 
are real and positive. As in the real case, let us extract х р, by partitioning X as follows: 


ў Es Xi I- Xi KA 


| omat r-2=| E x 
Хэр Xn —X» I — X» 


where X» = Xpp is a real scalar. Then, the absolute value of the determinants have the 
following representations: 
|detGOJ*7? = deti) |" хрр — Xn X1 Xp? Q 
where * indicates conjugate transpose, and 
|де — ХӘ)” = [децу — Xi) "I0 — xpp) = Xu Хи) ХЫР”. Gi) 
Note that whenever X and J — X are Hermitian positive definite, Xn 11 1 and (I — Х TE 


are too Hermitian positive definite. Further, the Hermitian forms X 21 X 11 1X 12 and X a4 — 
X n) D 1, remain real and positive. It follows from (i) and (ii) that 


XnXq Xp X Xpp < 1— XA — Xi) x. 


Since the traces of Hermitian forms are real, the lower and upper bounds of xp, are real as 
well. Let 


М Ek Mont x 1 
W = Xa X (1— Xu)? 
for fixed X 11- Then 
dX»; = |det(X1))| ае — Ху) а 
апа |det(X)|*-?, |де — Ху) will become |det(X ;;)|**!-?, | ае — X11) 8+1? , 


respectively. Then, we can write 
М — хрр) — Xn Xil Xt, — Xa — Xi) 1 Xp)? 
= (b — y)f = БР — y/b PP. 
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Now, letting и = y/b, the factors containing и and b will be of the form и? (1 — 
u)P—P pe +b—2P+1. the integral over и then gives 


1 
Г («о — (р— 1))Г(В – (р – 1 
[а cona (a — (p 1) (B – (p 1) 
(a+ В —2(p – 1) 
for (0) > p — 1, (8) > p — 1. Letting v = W W* and integrating out over the Stiefel 


manifold by making use of Theorem 4.2a.3 of Chap. 4 or Corollaries 4.5.2 and 4.5.3 of 
Mathai (1997), we have 


, 


gp 


yD lg, 
Г(р — 1) 


dW = 
The integral over b gives 


1 
[ 677a. = А pP-D-lq ы v)*t6-2ptlg, 
0 


| (p - Га +В -2(p - 3) 
E Dl'(a 4 B—p-1) 
for9t(a) > p — 1, W(8) > p — 1. Now, taking the product of all the factors yields 
Г(о = р+) ГВ -р+1) 
Г(«+68—р-+ 1) 
for (0) > p — 1, R(B) > p — 1. On extracting x5—.1,5—1 from X11 and |/ — Xıl and 
continuing this process, in the end, the exponent of z will be (p—1)+(p—2)+---+1= 
P-D and the gamma product will be 
Г(« — (р — 1))Г(е — (р 2 Ore — (р —1))--- Г(8) 
ræ +В = (р 10) +++ Г(е +8) | 


. р(р-) , 
These factors, along with zz 2 give 


Py) _ 
l(a + В) 


The procedure for evaluating а type-2 matrix-variate beta integral by partitioning matrices 
is parallel and hence will not be detailed here. 


» 


|det(X,)|**! ?|det( — X11) tP a?! 


B,(a, В), Ræ) > p — 1, R(E) > p — 1. 


Example 5.3a.1. For p = 2, evaluate the integral 
| |det(X)|%~? |де — X)|P- Pax 
О<Х<1 


as а multiple integral and show that it evaluates out to B»(a, В), the beta function in the 
complex domain. 
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p(p— 2(1 
Solution 5.3a.1. For p — 2, ns x zx = 7, and 


Г(о)Г(а — 1)Г(8)Г(8 — 1) 
Г(а + 8)I'(a +В — 1) 


By(a, B) = л 


whenever Si(a) > 1 and 91(8) > 1. For p = 2, our matrix and the relevant determinants 
are 2n 
z = [RS |. асаа idet — ib 
X19 X22 
where Хү, is only the conjugate of X1? as it is a scalar quantity. By expanding the determi- 
nants of the partitioned matrices as explained in Sect. 1.3, we have the following: 


Ed vk 
S "e ~x z-lz ~ [z Х12^12 ; 
det(X) = Xi1[X22 — 0 X12] = DUET == (i) 
11 
z _ _ ХХ} " 
de - X) = 0 = Ap [17 $5 - 2], (ii) 
1-х 
From (i) and (ii), it is seen that 
Х12Х ХХ} 
cade sd es 12 
Х11 1—Х11 


Note that when X is Hermitian, X;; and X2» аге real and hence we may not place a tilde on 
these variables. Let y = x22 — X12X]5/x11. Note that y is also real since x12XT, is real. As 
well, 0 < y < b, where 


bd ES _ nh _,__ h 
X11 ] = xi xul — хип). 
Further, b is a real scalar of the form b = 1 — ww* where ш = СӘЕР > @Х12 = 
[х1101—х11)12 


х11(1 — x1,)dw. This will make the exponents of xj; and (1 — x11) аа — р+1 = о – 1 
and В — 1, respectively. Now, on integrating ош y, we have 


Bru 2 _30(a—1)r(p-1) 
a—2 _ 8—2 __ pat+p—3 ian 
(Ж (b— y) “dy =b Fats) F-J ' (о) > 1, (8) > 1. (її) 


Integrating out w, we have the following: 


I'(a + В – 2) 


T@+- D K 


а-а) =л 
D 
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This integral is evaluated by writing z = ww”. Then, it follows from Theorem 4.2a.3 that 


_ p )-1 
dw = ToD ?-!dz = ліс for p = 2. 


Now, collecting all relevant factors from (i) to (iv), the required representation of the initial 
integral, denoted by 4, is obtained: 


rra —1)Г(8)Г(8—1) _ Ро) (В) 
Г(а + B) (a +В – 1) Р,(о + В) 


whenever (0) > 1 and (6) > 1. This completes the computations. 


= Bo(a + В) 


5.3.3. General partitions, real case 


In Sect. 5.3.2, we have considered integrating one variable at a time by suitably parti- 
tioning the matrices. Would it also be possible to have a general partitioning and integrate 
a block of variables at a time, rather than integrating out individual variables? We will 
consider the real matrix-variate gamma integral first. Let the p x p positive definite matrix 
X be partitioned as follows: 


x= E X12 


‚ Ху being pı x pı and Xo», p» х po, 
Хэр X2 


so that X1» is pı х po with X21 = Хз and p; + p2 = p. Without any loss of generality, 
let us assume that ру > p2. The determinant can be partitioned as follows: 


EL PEL —1 — pt 
IX |х 2 [X22 — Хр Х|“? 


р+1 p+1 E! —1 —1 . pti 
= [Xn 2 X2 2 [I — Xa Xa Xi ХХ | 2. 


Letting 
l =l рр рэ 
Y = X5 X4 X 2 dY = |Х2| 2 |X| 2 dX21 
for fixed Ху апа X22 by making use of Theorem 1.6.4 of Chap. 1 or Theorem 1.18 of 
Mathai (1997), 


_ pti 


р+1 р+1 р+1 
ре ахо = xut рое 27 р — vv 75 ау, 


Letting 5 = YY’ and integrating out over the Stiefel manifold, we have 


е | 

T P1 pot 

dY = 5181272 45; 
D» C7) 
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refer to Theorem 2.16 and Remark 2.13 of Mathai (1997) or Theorem 4.2.3 of Chap. 4. 
Now, the integral over S gives 


J e sp- 8-8? ау ee 
О<$<1 Tp (a) 


for (o) > 2 = a Collecting all the factors, we have 


p2+1 pipa Tp, (е — E) 
2 PEE CP NEN 


р1+1 
ІХ 2 |X22|* 
Гр» (о) 


One can observe from this result that the original determinant splits into functions of Х 
and X22. This also shows that if we are considering a real matrix-variate gamma density, 
then the diagonal blocks X4; and X»» are statistically independently distributed, where 
Ху will have a p,-variate gamma distribution and X22, a p2-variate gamma distribution. 
Note that tr(X) = (Ху) + tr(X22) and hence, the integral over X22 gives Г, (о) and the 
integral over X11, Г, (œ). Thus, the total integral is available as 


м» Tp (a — 5) 


Tp (о) I, (at)? T, (0) = l, (œ) 


since z 7^ D, (a) Г, (о — 21) = Г, (о). 

Hence, it is seen that instead of integrating out variables one at a time, we could have 
also integrated out blocks of variables at a time and verified the result. A similar procedure 
works for real matrix-variate type-1 and type-2 beta distributions, as well as the matrix- 
variate gamma and type-1 and type-2 beta distributions in the complex domain. 


5.3.4. Methods avoiding integration over the Stiefel manifold 


The general method of partitioning matrices previously described involves the integra- 
tion over the Stiefel manifold as an intermediate step and relies on Theorem 4.2.3. We will 
consider another procedure whereby integration over the Stiefel manifold is not required. 
Let us consider the real gamma case first. Again, we begin with the decomposition 


pti 


— pH — pt E = 
XPT * = || 2 X2 — ХХ Xil? . (5.3.8) 


Instead of integrating out X», or X12, let us integrate out X22. Let X4; bea pı x pı matrix 
and Xz be a p2 x p» matrix, with p; + p2 = p. In the above partitioning, we require 
that Ху be nonsingular. However, when X is positive definite, both Ху and X»» will 
be positive definite, and thereby nonsingular. From the second factor in (5.3.8), X22 > 
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XX nm 12 as Хэ — Хэ Х ux 12 1S positive definite. We will attempt to integrate out 
Xn first. Let U = Xn — Xa Xn Xi so that dU = ахэә for fixed X4, and Х|». Since 
tr(X) = tr(X11) + tr(X22), we have 


eg "OG — g-t(U)-trüoi X7 Xi). 
On integrating out U, we obtain 
I U E Ody = Гю — Ta за) > — 
U>O 


: | ptl_ oy po pH 
since a — —5- =a-5 . Letting 


M! p 
Y = ХХ 2 dY = |X| ах 
for fixed X4, (Theorem 1.6.1), we have 


/ e "OX Xing x» - Xul? / e uU Yay. 
X21 Y 


But tr(Y Y^) is the sum of the squares of the p; p2 elements of Y and each integral is of the 
form /25, е2 dz = /m. Hence, 


fe e "OYDAY = r а 
Y 


We may now integrate out X11: 


] Xp tPF eH AX, 
Хуу>О 


= / [Xul 22е) ах 
Хуу>О 
= Г, (a). 


Thus, we have the following factors: 


zz 


Гр, (а — р1/2) Г, (a) = (a) 


since 


—1 —1 —1 
pi(ni ) | Р20р2 ) BP . р(р—1) 


4 4 "nm 4 s р = ріг ро, 


Matrix- Variate Gamma and Beta Distributions 317 


and 


Гр (a) Ip, (@ — p1/2) = Г(о)Г(о — 1/2) --- (a — (pı — 0/2)», (а — (р1)/2) 
= l'(a)--- (œ — (pı + рә – 0/2). 


Hence the result. This procedure avoids integration over the Stiefel manifold and does not 
require that pı > p2. We could have integrated out X11 first, if needed. In that case, we 
would have used the following expansion: 


—ptl —ptl —i — ptr 
IX ? -|Xol" ?|Xu-XpX5Xn" ?. 


We would have then proceeded as before by integrating out X, first and would have ended 
up with 


PIP2 
m 2? Ty (a — p2/2)Ip,(@) = Г, (а), p= pi + рэ. 


Note 5.3.1: If we are considering a real matrix-variate gamma density, such as the 
Wishart density, then from the above procedure, observe that after integrating out X22, 
the only factor containing X»; is the exponential function, which has the structure of a 
matrix-variate Gaussian density. Hence, for a given X11, X», is matrix-variate Gaussian 
distributed. Similarly, for a given X22, X12 is matrix-variate Gaussian distributed. Further, 
the diagonal blocks Х1 and X22 are independently distributed. 

The same procedure also applies for the evaluation of the gamma integrals in the com- 
plex domain. Since the steps are parallel, they will not be detailed here. 


5.3.5. Arbitrary moments of the determinants, real gamma and beta matrices 


Let the p x p real positive definite matrix X have a real matrix-variate gamma density 
with the parameters (о, B > О). Then for an arbitrary Л, we can evaluate the h-th moment 
of the determinant of X with the help of the matrix-variate gamma integral, namely, 


1 
/ X- F e MBM ах = BIT, (о). (i) 
X>O 


By making use of (1), we can evaluate the h-th moment in a real matrix-variate gamma 
density with the parameters (a, В > О) by considering the associated normalizing con- 
stant. Let u; = |X|. Then, the moments of и can be obtained by integrating out over the 
density of X: 
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[04 
E[ui]^ "E B u^ |X F е ay 
Ip (a) Jx>o 
В|“ h- —и(ВХ 
=! bees Ze tr( Jax 
I5(a) X-0 
В| р— 1 
= —— Tp (а + h)B| 0*9, (о +h) > ——. 
T, (a) p(o + h)|B| (a +h) > > 


Thus, 
Гр(о +h) 


E[ui]* = |B|” 
[u1] |B| T, (o) 


ee e 
2 

This is evaluated by observing that when E[u;]" is taken, œ is replaced by а + h in 
the integrand and hence, the answer is obtained from equation (i). The same procedure 
enables one to evaluate the h-th moment of the determinants of type-1 beta and type-2 
beta matrices. Let Y be a p x p real positive definite matrix having a real matrix-variate 
type-1 beta density with the parameters (o, 8) and u2 = |Y |. Then, the h-th moment of Y 
is obtained as follows: 


Г, + m 
buc hm А uly 7 5 41 — у|ё- ay 
I, (a@) Ip (B) O«Y«I 
_ Гу(« + B) iy ph gp o yp ay 
Ty (a@) I p(B) О<Ү-<1 
Г, Г, hr —1 
_ Гра * B) Ip@ +h) PP) wath > 2 | 
Г,(о)Г,(В) D5( 4 B h) 2 
I h) Г — 1 
к г ш е ed ps ы; 
Ге) Ij(a+B +h) 2 
In a similar manner, let из = |Z| where Z has a р x р real matrix-variate type-2 beta 


density with the parameters (о, 6). In this case, take o + В = (œ +h) + (8 — А), replacing 
a by a +h and В by В — h. Then, considering the normalizing constant of a real matrix- 
variate type-2 beta density, we obtain the h-th moment of из as follows: 


py. Гра +h) Г,(В — h) p-1 Е р— 1 
Е[из] = T, (o) TB) MOH) a , KB By : 


Relatively few moments will exist in this case, as (0 + h) > pot implies that 91 (л) > 
—8 (a) + pot and (В — А) > pot means that (л) < 9:(8) — mE Accordingly, only 
moments in the range —R (œ) + pot < (л) < (B) — pot will exist. We can summarize 
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the above results as follows: When X is distributed as a real p x p matrix-variate gamma 
with the parameters (o, B > O), 


B|^r,(o +h -1 
EX, = ше. Ræ) > pe (5.3.9) 
I, (a) 2 
When Y has a p x p real matrix-variate type-1 beta density with the parameters (о, 8) and 
if u2 = |Y | then 


_ Гуа +) Га + В) 


h Гр +В) р—1 
Е [ил] = Гб} Га В+ R(a +h) > к (5.3.10) 


When Ше р x p real positive definite matrix Z has a real matrix-variate type-2 beta density 
with the parameters (o, В), then letting из = |Z], 


Гуо +h) Г,(В – ) 


Eju = ГЁ) Wo) ALL cng < 8B) — LL. (53.11) 
Гр(а) D») 2 2 


Let us examine (5.3.9): 


Ex = [pi^ Dot +” 
Гъ (a) 
_увжГе+Ю Га 5+0 Га Ep +h) 
Г(0) Г(о 1) Г(а — 251) 


= E[x] E[x2] -- - E[x?] 


where x; is a real scalar gamma random variable with shape parameter a — it and scale 
parameter А; where A; > 0, j = 1,..., p are the eigenvalues of B > О by observing 
that the determinant is the product and trace is the sum of the eigenvalues А1, ..., Ар. Fur- 
ther, x1, .., xp, are independently distributed. Hence, structurally, we have the following 


representation: 


р 
ixi = T Ix; (5.3.12) 
jel 


where x; has the density 


j oe uoc 
AWG) = тес ES 2 e PEE 0x < OQ, 
Qœ — —— 
2 
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for (0) > m Aj > О and zero otherwise. Similarly, when the p x p real positive 
definite matrix Y has a real matrix-variate type-1 beta density with the parameters (o, В), 
the determinant, |Y |, has the structural representation 


P 
MESEBEZ (5.3.13) 
j=l 


where y; is a real scalar type-1 beta random variable with the parameter (о — i, B) 
for j = 1,..., p. When the p x p real positive definite matrix Z has a real matrix- 
variate type-2 beta density, then |Z|, the determinant of Z, has the following structural 
representation: 


р 
121 =| [si (5.3.14) 
j=l 


where z ; has a real scalar type-2 beta density with the parameters (o — = p= i) for 


P= „лу ps 


Example 5.3.2. Consider a real 2 x 2 matrix X having a real matrix-variate distribution. 
Derive the density of the determinant | X | if X has (a) a gamma distribution with the param- 
eters (a, В = I); (b) areal type-1 beta distribution with the parameters (o = 3, В = 3); 
(с) areal type-2 beta distribution with the parameters (a = 3, B= 3). 


Solution 5.3.2. We will derive the density in these three cases by using three different 
methods to illustrate the possibility of making use of various approaches for solving such 
problems. (a) Let ui = |X| in the gamma case. Then for an arbitrary Л, 


Г(о + т) Г(о +h- 1 
E[u] = D ЗА T MAE | D ways 3 
p (а) Г(о — 5) 


Since the gammas differ by І, they can be combined by utilizing ће following identity: 


1-т (d 1 m-—1 
D(mnz) = (2л) 2 m" ?r(zg)r(z4—)-:r(z4——) m=1,2,..., (5.3.15) 
m m 
which is the multiplication formula for gamma functions. For m = 2, we have the dupli- 
cation formula: | 
Г) = Ол) 227-2 (2) (z + 1/2). 


Thus, 
1 
Г()Г( + 1/2) = тл?2!-®Г (2z). 
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Now, by taking z = a — 1 + h in the numerator and z = a — 1 in the denominator, we can 
write 
| D(ah)F(a-h—53)  DQa —1- 2h), 5, 


E[u]] = = 
wa] Г(о)Г(о = 1) ГО 1) 
Accordingly, 
к PQe=1+42h) ba.  FQe—1+2h) 
= Pog wo d € т 


1 
This shows that v = 2u? has a real scalar gamma distribution with the parameters (2a — 
1, 1) whose density is 


1 
(2a—1)—1 3420-2 1 А 
v Qu;) 2 H 
dv = Udy = 1 —2ui d(2u2 
чке олсе туе ишет сые и 
92а—2107271 E 
= шу e vdd 


Hence the density of и, denoted by ў (ит), is the following: 


1 
22a—2,8—5-1 1 
fiu) = е2, 0 <и < oo 


I'(2a — 1) 
and zero elsewhere. It can easily be verified that (и) is a density. 
(b) Let u» = |X|. Then for an arbitrary h, = 3 and 8 = 3, 
Гуо +h) Га +B) 
Гуо) Г.а +В+А) 
|. Fr G) Г@+ЮГ@+й) 
 rGràrea-nrG-h) 


Е[и!] = 


=з] 1 EE 2 " 2 Е 4 | 
""lacmacmáismh 2-5 1-h inl 


the last expression resulting from an application of the partial fraction technique. This 
results from h-th moment of the distribution of u2, whose density which is 


І 
fo(u2) = 6{1 + u2 —2u5), 0 < w2 < 1, 


and zero elsewhere, is readily seen to be bona fide. 
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(c) Let the density из = |X| be denoted by (из). The Mellin transform of f3(u3), with 
Mellin parameter s, is 


Еи] = Гиш@+—1уГЫ(В =#=р1у: I»G-ts-lDIG-s-l 
Tp (a) D»(B) DG) DÈ) 


рар зл OP DIET ы. 


the corresponding density being available by taking the inverse Mellin transform, namely, 
4 1 CA 
(из) = x] Г(5)Г (s + 1/2 (5/2 — 5) Г(2 — 5)и» 5 (i) 
л 2лї orm 


where i = 4/(—1) and c in the integration contour is such that 0 < c < 2. The integral in 
(i) is available as the sum of residues at the poles of I (s) (s + 3) for 0 < из < 1 and the 
sum of residues at the poles of Г(2 — sre — s) for 1 < из < oo. We can also combine 
Г (5) and Г(5 + 1) as well as Г (2 — s) and re — s) by making use of the duplication 
formula for gamma functions. We will then be able to identify the functions in each of 


the sectors, О < из < 1 and 1 < из < оо. These will be functions of ui as done in the 
case (a). In order to illustrate the method relying on the inverse Mellin transform, we will 
evaluate the density /з (из) as a sum of residues. The poles of /'(s)I'(s + 1) are simple 
and hence two sums of residues are obtained for 0 < из < 1. The poles of Г (s) occur at 
s = —v, v = (0), 1,..., and those of I'(s + 5) occur at s = -4 — v, v=0,1,.... The 
residues and the sum thereof will be evaluated with the help of the following two lemmas. 


Lemma 5.3.1. Consider a function Г(у + ѕ)ф(ѕ)и ? whose poles are simple. The 
residue at the pole s = —y — v, v = 0, 1,..., denoted by R,, is given by 


NO 


y! 


R, ф(—у— v)uY *". 


Lemma 5.3.2. When T (8) and T (8 — v) are defined 


САО). 


Е єч 


where, for example, (a), = a(a+1)---(a+v—1), (a)o = 1, а £0, is the Pochhammer 
symbol. 
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Observe that Г (о) is defined for all a zz 0, —1, —2,..., and that an integral represen- 
tation requires 91(o) > 0. As well, r (œ +k) = Г(о)(о)к, k = 1,2,..... With the help 
of Lemmas 5.3.1 and 5.3. * the sum of the residues at the poles of Г (s) in the integral in 
(i), excluding the constant 4 =, is the following: 


Y CM BC = \r(2 4 »re +v) 
v=0 ` 
-Erro G ova 


5 1 
= F (5.2: xs ), 0 < < 1, 
47? 1 2 2 U3 = из S 


where the 2? F1(-) is Gauss’ hypergeometric function. The same procedure consisting of 
taking the sum of the residues at the poles s — -j —v, v=0,1,..., gives 


1 53 
—3лиз TIG 3/5 из), 0О<из<1. 


The inverse Mellin transform for the sector 1 < из < oo is available as the sum of 
residues at the poles of rG — 5) and Г(2 — s) which occur at s = 3 +vands=2+0 
for v = 0, 1,... . The sum of residues at the poles of re — 5) is the following: 


3 


= Dr r(2 +»)г@+»г(— ; – v)u” 


_3 5 3 1 
mcm. ^ 2Fi(5, 3s y a) 1 < u3 < oo, 


and the sum of the residues at the poles of Г (2 — s) is given by 
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1 
PUT т), 1 <и < o0. 


3 2 

А” и 3 Е 1 (2, 
Now, on combining all the hypergeometric series and multiplying the result by the constant 
2, the final representation of the required density is obtained as 

3 oFi(3, 2; 1; из) — 12u32FiG 


Зиз 22102, 3; 2; 1) — 124? F383 гаа | =e oe, 


jd , 3; 3; u3), 0<u3<l, 


fatua) = 


This completes the computations. 
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5.3a.2. Arbitrary moments of the determinants in the complex case 


In the complex matrix-variate case, one can consider the absolute value of the deter- 
minant, which will be real; however, the parameters will be different from those in the real 
case. For example, consider the complex matrix-variate gamma density. If X has a p x p 


complex matrix-variate gamma density with the parameters (a, В > О), then the /-th 
moment of the absolute value of the determinant of X is the following: 


_ |det(B)|7 hP (oi +h) 
Po) 


Ded ds» x 
= dos Ag)" П IUS pp. =j] E[X]^, 
j=l 


E[|det(X)|]" 


Г(о — (j — 1) 


that 1s, | det(X )| has the structural representation 
|det(X)| = X132 - Xp, (5.3a.8) 


where the X; is a real scalar gamma random variable with the parameters (y — (j — 1), àj), 
j = L..., p, and the x;’s are independently distributed. Similarly, when Yisapxp 
complex Hermitian positive definite matrix having a complex matrix-variate type-1 beta 
density with the parameters (a, В), the absolute value of the determinant of Y 3 Idet(Y YI, 
has the structural representation 


p 
|det(Y)| = | [ v; (5.3a.9) 
j=l 

where the y;'s are independently distributed, y; being a real scalar type-1 beta random 
variable with the parameters (о — (j — 1), В), j = 1,..., p. When Zisa p x p Her- 
mitian positive definite matrix having a complex matrix-variate type-2 beta density with 
the parameters (a, f), then for arbitrary л, the h-th moment of the absolute value of the 
determinant is given by 


" M Г,(о +h) Г.В — №) h) 
E[|det(Z = = 
[Idet(Z)]]" Г.да) T. 
p Р ; 
» —(/—1)+Л) Г(8—(7—1)—В) 
={I1-F rœ -— (j -— 1)) HT Г(В – (7 – 1)) | 


j=l 


P 
= 


j=l 
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so that the absolute value of the determinant of Z has the following structural representa- 
tion: 


[2 
4де(2)| = | [ zj (5.34.10) 
j=l 

where the Z j's are independently distributed real scalar type-2 beta random variables with 
the parameters (y — (j — 1), В — (j — 1)) for j = 1,..., p. Thus, in the real case, the 
determinant and, in the complex case, the absolute value of the determinant have struc- 
tural representations in terms of products of independently distributed real scalar random 
variables. The following is the summary of what has been discussed so far: 


Distribution Parameters, real case Parameters, complex case 

gamma (@ -— 4#, Àj) (a — (j — 1), Aj) 

type-1 beta (a — Ll, В) (a — (7 — 1), В) 

type-2 beta (a — > B= HL) (е -— (j - 1), £- G- 1) 

for j = 1,..., p. When we consider the determinant in the real case, the parameters differ 


by 1 whereas the parameters differ by 1 in ће complex domain. Whether in the real or 
complex cases, the individual variables appearing in the structural representations are real 
scalar variables that are independently distributed. 


Example 5.3a.2. Even when р = 2, some of the poles will be of order 2 since the 
gammas differ by integers in the complex case, and hence a numerical example will not 
be provided for such an instance. Actually, when poles of order 2 or more are present, the 
series representation will contain logarithms as well as psi and zeta functions. A simple 
illustrative example is now considered. Let X be 2 x 2 matrix having a complex matrix- 
variate type-1 beta distribution with the parameters (a = 2, В = 2). Evaluate the density 
of ù = |det(X)]. 


Solution 5.3a.2. Let us take the (s — 1)th moment of й which corresponds to the Mellin 
transform of the density of й, with Mellin parameter s: 


Г +5—1) Р, + В) 
Fo) ГР. +В+5— 1) 
_ Г(@-+8)Г( + 8—1) Г(а +5 – 1) Г(а +5 – 2) 
Г(а)Га- 1) Г(а+в+5- 1) Го +В+5 – 2) 
Г(4Г(З) Г( +5) Г (5) 12 


M» TOr(D0rG-c-srQ-c-s (2+5)(1 + ss 


E[i5-1] = 
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The inverse Mellin transform then yields the density of и, denoted by g(u), which is 


1 c-Fioo 1 
ae -sg 
800) Lm TN байар 


where the с in the contour is any real number с > 0. There is a pole of order 1 at s = 0 and 
another pole of order 1 at s = —2, the residues at these poles being obtained as follows: 


(0 


us —5 и? 


lim =-~, lim = ——. 
s>0 (2+ 5)(1 + 5)2 2? s>-2 (1+ sys 2 


The pole at s = —1 is of order 2 and hence the residue is given by 
| d u^? . (— Inu)u— и и 
lim |= | = lim {———- - =- — - 
so-lldss(Q+s) so-1l s(24 5) s2(2+s) 5(2 + 5)2 
=ulnu—u+u=ulnu. 


5 


Hence the density is the following: 
E) = 6 — би? + 12ulnu, О <и <1, 


and zero elsewhere, where и is real. It can readily be shown that 2 (4) > 0 and i. g(u)du = 
1. This completes the computations. 


Exercises 5.3 


5.3.1. Evaluate the real p x p matrix-variate type-2 beta integral from first principles or 
by direct evaluation by partitioning the matrix as in Sect. 5.3.3 (general partitioning). 


5.3.2. Repeat Exercise 5.3.1 for the complex case. 


5.3.3. In the 2 x 2 partitioning of a p x p real matrix-variate gamma density with shape 
parameter o and scale parameter J, where the first diagonal block X11 isr xr, r < p, 
compute the density of the rectangular block X12. 


5.3.4. Repeat Exercise 5.3.3 for the complex case. 


5.3.5. Let the p x p real matrices X, and X» have real matrix-variate gamma densities 
with the parameters (01, B > О) and (a2, B > О), respectively, В being the same for 
al 1 1 1 
both distributions. Compute the density of (1): U1 = X, ?X1X, ^, (2): U2 = XI XS 
(3): U3 = (Ху + Х›)-2 Xa(X| + X when X and X» are independently distributed. 
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5.3.6. Repeat Exercise 5.3.5 for the complex case. 


5.3.7. In the transformation Y = / — X that was used in Sect. 5.3.1, the Jacobian is 
p pp) 


ау = (—1) ? dX. What happened to the factor (C1) 2 ? 


5.3.8. Consider X in the (a) 2 x 2, (b) 3 x 3 real matrix-variate case. If X 1s real 
matrix-variate gamma distributed, then derive the densities of the determinant of X in (a) 
and (b) if the parameters аге œ = >, В = I. Consider X in the (a) 2 x 2, (D) 3 x 3 complex 
matrix-variate case. Derive the distributions of |det( X )| in (a) and (b) if X is complex 
matrix-variate gamma distributed with parameters (y = 2 + i, B = I). 


5.3.9. Consider the real cases (a) and (b) in Exercise 5.3.8 except that the distribution 
is type-1 beta with the parameters (a = 3. p= 3). Derive the density of the determinant 
of X. 


5.3.10. Consider X, (a) 2x2, (b) 3x3 complex matrix-variate type-1 beta distributed 
with parameters о = 3 +i, В = 3 — i). Then derive the density of |det(X)| in the cases 
(a) and (b). 


5.3.11. Consider X, (a) 2 x 2, (b) 3 x 3 real matrix-variate type-2 beta distributed with 
the parameters (о = 3, В = 3). Derive the density of |X| in the cases (a) and (b). 


5.3.12. Consider X, (a) 2 x2, (b) 3x3 complex matrix-variate type-2 beta distributed 
with the parameters (a = 3, В = 3). Derive the density of |det( X)| in the cases (a) and 


(b). 
5.4. The Densities of Some General Structures 


Three cases were examined in Section 5.3: the product of real scalar gamma vari- 
ables, the product of real scalar type-1 beta variables and the product of real scalar type-2 
beta variables, where in all these instances, the individual variables were mutually inde- 
pendently distributed. Let us now consider the corresponding general structures. Let x; 
be a real scalar gamma variable with shape parameter o; and scale parameter 1 for con- 
venience and let the x;'s be independently distributed for j = 1,..., p. Then, letting 
Vl = X1 Xp, 


Р Г(о; +) . ‚ 
Ep] = П Te)” (0; +h) > 0, (0) > 0. (5.4.1) 
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Now, let y1, ..., ур be independently distributed real scalar type-1 beta random variables 
with the parameters (v;, Bj), N(aj) > 0, 2(6;) > 0, j = 1,..., p, апаз = y1 ++: yp. 


EN] = IT Paj+h) Г(о; + В) 


5.4.2 
(aj) Г(а; +В; +А) ( ) 


gel 


for R(@;) > 0, R(B;) > 0, Ræ; +h) > 0, j = 1,..., p. Similarly, let 21,..., Zp, 
be independently distributed real scalar type-2 beta random variables with the parameters 
(aj, Bj), j=1,..., p, and let v3 = 21 ··· 2р. Then, we have 


mie П] D(a; +h) Г(В; — h) 


$7711 Pep гер 


(5.4.3) 


for R(aj;) > 0, R(B;) > 0, Ræ; +h) > 0, (В; — А) > 0, j = 1,..., p. The 
corresponding densities of v1, v2, v3, respectively denoted by g1(vi), g2(v2), g3(v3), are 
available from the inverse Mellin transforms by taking (5.4.1) to (5.4.3) as the Mellin 
transforms of g1, g2, g3 with Л = s — 1 for a complex variable s where s is the Mellin 
parameter. Then, for suitable contours L, the densities can be determined as follows: 


gii) = = | Elvi ju; ds, i = / (—1), 
р 1 1 р B 
= Ш ез JL ATE n nb ds 


j=l 
р 
= |] Jon оуу» OS <00, — 646 
pen Г(а;) P 
where i(aj +5 — 1) > 0, j =1,..., p, and gj(vi) = 0 elsewhere. This last representa- 


tion is expressed in terms of a G-function, which will be defined in Sect. 5.4.1. 


1 
та [| во, i = //(—1), 
ЛІ Jr, 


pem 1 fW Г(а; +5 – 1) |за 
ja Ге) Jeni 17 Гу +В; +s- 0) : 


р 
Г j | Qj -—1, j= 
= | I] Pert P| сро [| res а у= | l 0 Р 1, (54.5) 
I'(a) p. aj—-1, ј=1,..,р 
j=l J 


Matrix- Variate Gamma and Beta Distributions 329 


where Gb is a G-function, 9t (o; +s — 1) > 0, (е) > 0, 8(8,)) > 0, Ј = 1,..., p, 
and g2(v2) = 0 elsewhere. 


zi], E[vj |]vj?ds, i = JCD, 
L 


2лі 


"Д пант +5 DI (5; -s + 10) оуд 


x 
=i П raras РЕС Fepr@pl® 


where 9t(o;) > 0, R(8;) > 0, Ha;+s—1) > 0, RG;—s+1)>0, j=1,..., р, 
and g3(v3) = О elsewhere. 


&з(®з) = 


usr |, октоо, (546) 


уу 
“З 


5.4.1. The G-function 


The G-function is defined in terms of the following Mellin-Barnes integral: 


CQ) л шесе |: 4 ke: A 


pee; 


= эп] oor as, і = ү (—1) 

2лі L 
dc ur. DG; + 9H- TA (П1—а;—5)} 
UT usa ra- bj n SHIT =n+1 T (aj + s)} 
where the parameters aj, j = 1,...,p, bj, j = 1,...,q, can be complex numbers. 
There are three general contours L, say L1, L2, L3 where Г is a loop starting and ending 
at —oo that contains all the poles of (bj + s), j = 1,...,m, and none of those of 
Г(1—а;— s), j = 1,...,n.In general L will separate the poles of "(bj + s), j = 
1,...,m, from those of Г(1 — a; — s), j = 1,...,n, which lie on either side of the 
contour. L» is a loop starting and ending at +00, which encloses all the poles of Г(1 — 
aj —S), j =1,...,n. Lais the straight line contour c — ioo to c + ioo. The existence of 
the contours, convergence conditions, explicit series forms for general parameters as well 
as applications are available in Mathai (1993). G-functions can readily be evaluated with 
symbolic computing packages such as MAPLE and Mathematica. 


Example 5.4.1. Let x1, хо, x3 be independently distributed real scalar random variables, 
xı being real gamma distributed with the parameters (v; = 3, В = 2), x2, real type-1 beta 
distributed with the parameters (o = 3 +2i, Вә = 5) and x3, real type-2 beta distributed 
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with the parameters (оз = ++i, Вз = 2— i). Let uy = x1x2x3, U2 = ee and u3 = ста 
with densities g;(u;), j = 1, 2, 3, respectively. Derive the densities g;(u;), j = 1, 2, 3, 
and represent them in terms of G-functions. 

Solution 5.4.1. Observe that E[i] = E[x;**!], j = 1,2, 3, and that g1 (u1), g2(u2) 
and 23(u3) will share the same ‘normalizing constant’, say c, which is the product of the 
parts of ће normalizing constants in the densities of хү, x2 and хз that do not cancel out 
when determining the moments, respectively denoted by сі, c2 and сз, that is, c = С С2 c3. 


Thus, 
1 Г(о + 2) 1 


Ё Гат) Fe) Te) 
1 Г(2+21) 1 


= 3 Р ar mae (0 
Г(3) rG-c-2)0rG-c-)0rQ-i) 
The following аге eam and EI sl for j = 1, 2, 3: 
Ek saa P2+s), о ?*1 = с 27*l (4 — 5) (ii) 
r +2i+s) rÈ +2i-—s) 

E 5—1 - 2 E —s+l1 - 2 ET 
2 1=0 payar l-^"TGya-s un 
Е[х5—1] = сз FG/2 i - s) -i — 5), EIx5?* ] = єз Г(7/2 +1 - )P (0 — i +5). 
(iv) 


Then from (i)-(iv), 

rG +2i+s) 
r +2i+s) 
Taking the inverse Mellin transform and writing the density gı(u1) in terms of a 


G-function, we have 
) C G? BET na 142i 
uj) —zG54|— . 
aiu 2 т %Ф® 2 о, 142i, 3i 
Using (7)-(iv) and rearranging the gamma functions so that those involving +s appear 
together in the numerator, we have the following: 


E[u5] = 225 ret jl PU ps opcs E Tes 
p r3 +2i-— s) 
Taking the inverse Mellin transform and expressing the result in terms of a G-function, we 
obtain the density g2(u2) as 


C u —3-—2i, —3-i 

вид) 22623 | ү уз, 

2(u2) = 5 S ; 
223| 2 l2, 1zi, $+2i, —2—2 


Ejus] =c r2 +s) PG/2+i+ts)FB—-i-s). 
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Using (i)-(iv) and conveniently rearranging the gamma functions involving +s, we have 
rg+i-s) 
rA+2i +s) 

On taking the inverse Mellin transform, the following density is obtained: 


—3, -$-i, 142i 
5+2i, 1—i ` 


E[u3 !] = 2c2^2T (0/2 2i c 5) (1 i - 5) (4 — s) 


g3(u3) = 2c G3 2 2) 20, 
This completes the computations. 
5.4.2. Some special cases of the G-function 


Certain special cases of the G-function can be written in terms of elementary functions. 
Here are some of them: 


бу Cla) = 48е, z #0 


gi |07] = re - ©, || <1 

a 1 
ede nt] = Fer d – 2°, |z| <1 
Gy [a a | cu xem =| laz?| od 


В 
1,1 ajl-y+8/a] __ ii < | a 

Стах | |= гоа az laz*| < 1 
Г Qa 

= с а) + 28 + (1 271, |а| « 1 


1 


l—a,1—a IU 2 iuo 
asfel 2a E 2а |а| « 1 
2 


4 2—4 5 
val gla] = (у) ов 
eos = ——csinhz 
TOU. ето 
буз. a | | = 2 Zeoshz 
02| 4 04 
Gy3|+ {|| | =In(1+£2), |2| <1 


332 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


GL? | pads 1—ap 
p.q41|^ 0, 1—b4,...1—bg 


Е ee 
D'(bi) --- T (bq) 


ILS Ь\,..., bq; —2) 


for p< gor p=q+1and|z| < 1. 
5.4.3. The H-function 


If we have a general structure corresponding to у, v2 and v3 of Sect. 5.4, say w1, w2 
and шз of the form 


шу = zu z ‚ху? (5.4.7) 
8186 Sp 
шз = уу): Ур (5.4.8) 
Sp 
шз = 2012 -zy (5.4.9) 
for some 6; > 0, j = 1,..., p the densities of wj, w2 and шз are then available in 


terms of a more general function known as the H-function. It is again a Mellin-Barnes 
type integral defined and denoted as follows: 


= m,n m,n (а1,01),..., | 
H(z) = Hyg (z) = Hy, Piz Qi BD. (bq, bq) 


= xi] V(s)z "ds, i = у (7L), 
27i L 
(TB M адас 1 PC — aj — ajs)} 


y(s) = (5.4.10) 
Па = Bj)S MIT S a D (jt oj5) 

where aj > 0, j = 1,...,р, Bj > 0, j = 1,...,q, are real and positive, aj, j = 

l,...,p,and bj, j = l,...,q, are complex numbers. Three main contours L1, L2, L3 


are utilized, similarly to those described in connection with the G-function. Existence 
conditions, properties and applications of this generalized hypergeometric function are 
available from Mathai et al. (2010) among other monographs. Numerous special cases can 
be expressed in terms of known elementary functions. 


Example 5.4.2. Let x; and x2 be independently distributed real type-1 beta random vari- 
ables with the parameters (v; > 0, В; > 0), j = 1, 2, respectively. Let y; = Ж, ôi > 0, 
and y? = i, 52 > 0. Compute the density of u = у у. 
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Solution 5.4.2. Arbitrary moments of y; and y» are available from those of x; апа 
X2. 


(aj th) (a; + В) 
Г(а;) Г(0; + В; +h) 

Г(0;+8;А) Г(о; + 8) 
Ay ae ав с ИШ 
opa d Г(а)  L(w;4 j-àjh) 
Е[и* !] = Ely EDs] 


ТЕ j(s = 1)) rœ; + Bj) 


E[x^] = 


(0; л) > 0, j = 1,2, 


(0; + 4;h) > 0, 


— a (5.4.11) 
I'(oj) I'(oj + Bj +4;(s — 1) 
Accordingly, the density of u, denoted by g(u), is the following: 
£l. quasi Т; 
ga) - c— || joni lus 
27i ie I'(aj + Bj — 6; + 85) 
2.0 (о1+81—81, 51), (02+82—82, 52) 
= CH, ? и , 
i (01—51, 51), (02—82, 82) 
Г 
om П се, (5.4.12) 
pu Г(о;) 


where 0 < u < 1, (0; —8; +95) > 0, Raj) > 0, (8) > 0, j = 1, 2апа g(u) = 0 
elsewhere. 


When à; = 1 = .-- = ор, B1 = 1 = .-- = fq, the H-function reduces to a 
G-function. This G-function is frequently referred to as Meijer's G-function and the H- 
function, as Fox's H-function. 


5.4.4. Some special cases of the H-function 


Certain special cases of the H-function are listed next. 


1 
H}? =й „зр 
Ay [xlopl=B x?e ^ ; 


ну kon] = 70а + 7° = Г); ; 2), 14 < 1; 


e ш a] 
=h 
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where the Bessel function 


мо = prp = Heep 
ay 2 [seien] = P F(a; с: - 2); 
вара] LOL no 
Hi Е (о, eee ine FEET g), RY) > 0, 


where the generalized Mittag-Leffler function 


EL m Y OD t We > 0, 90) > 0 
s c M D(8-ak) | 


where Г(у) is defined. For у = 1, we have E! ape) = = Ey p(z); when у = 1, В = 1, 
El (2) = Е,(2) and when y = 1 = В = a, we have E\(z) = et. 


2,0 = v 
Hy» koren] = oK,(z) 


where К (z) is Kratzel function 


оо 
куо = | rentas wo > 0 
0 


(0+ 5,1) 


1,0 
A, 1 


Е 


1 1 
=x 2g 2) esed 
| z (1 =z) А 
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2,0 
Hy Е 


3: ; 3 < А 
1 3 


| 


(a, 1), (о, 1) 


Exercises 5.4 


5.4.1. Show that 
2’ Gy 1] £ pf/* ВТУ ері І 


5.4.2. Show that 
= 1,1 [1/3 21| |722 
e = С) E а) = Суз En . 
5.4.3. Show that 
1 _5 10 |2 
z3(1—z) 6 = Г(1/6) G71 |z|? |. 
3 


5.4.4. Show that 


оо 
| x^ 1(1 — x)? *(1 + x — zx) dx 
0 


_ Г(а)Г(с — a) 
=: Г (с) 


5.4.5. Show that 
m,n (ат ,01), (а2,01), (43,03) „©, (ар,0р) 
(ay = an) Hs < 


(р1,Ві),...,(6а,Ва) 


= m,n ЧАЧЕ 
= Hy; [z (bi, BD, s (baba) 


- m,n (а= 1,01), (а2,01), (аз,0з),..., (ар,ар) 
Нрд E (bi. B1)... (ba. B) | ‚п> 2: 


5.5, 5.5a. The Wishart Density 


2F\ (a, b; с; z), |z| < 1, 8 (c — a) > 0. 
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A particular case of the real p x p matrix-variate gamma distribution, known as the 
Wishart distribution, is the preeminent distribution in multivariate statistical analysis. In 
the general p x p real matrix-variate gamma density with parameters (a, B > O), let 
a = 5, В = Ix -l and X > О; the resulting density is called a Wishart density with 


336 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


degrees of freedom т and parameter matrix X > О. This density, denoted by fw(W), is 
given by 
Wi tet 
fu(W) = E e29507W у» о, 2-90, (5.5.1) 
22 Elera) 
for m > р, апа fy(W) = О elsewhere. This will be denoted as W ~ У (т, X). Clearly, 
all the properties discussed in connection with the real matrix-variate gamma density still 
hold in this case. Algebraic evaluations of the marginal densities and explicit evaluations 
of the densities of sub-matrices will be considered, some aspects having already been 
discussed in Sects. 5.2 and 5.2.1. 
In the complex case, the density is the following, denoted by fu(W): 


t(Wym-Pg-uE- Ww) — 
ате ce Oa. Ss UU mis (5.5a.1) 
ае)" P, (m) 


and fu (W) = 0 elsewhere. This will be denoted as W ~ W, (m, X). 
5.5.1. Explicit evaluations of the matrix-variate gamma integral, real case 


Is it possible to evaluate the matrix-variate gamma integral explicitly by using conven- 
tional integration? We will now investigate some aspects of this question. 

When the Wishart density is derived from samples coming from a Gaussian population, 
the basic technique relies on the triangularization process. When 27 = Z, that is, W ~ 
УУ (т, I), can the integral of the right-hand side of (5.5.1) be evaluated by resorting to 
conventional methods or by direct evaluation? We will address this problem by making 
use of the technique of partitioning matrices. Let us partition 


Ху Хр 
Х = 
E zd 


where let X22 = хрр so that X21 = (xpi, ..., Xpp-1), X12 = X^. Then, on applying a 
result from Sect. 1.3, we have 


р+1 


p+1 +1 
[xe 7 (kul bp = Kn al И (5.5.2) 


Note that when X is positive definite, Ху > О and хрр > 0, and the quadratic form 
Xa Xj Xi > 0. As well, 


pil —1 —1 1 1 


— prt — pt 
[хрр — Xa Xy Xp] Z = = E Xpp Хә1Х үү Xa Хуәхрр l“ 2. (5.5.3) 
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Letting Y = =: 2XX,/ ds ш КЕ to Mathai (1997, Theorem 1.18) or Theo- 


rem 1.6.4 of Chap. 1, dY = Xpp? Х| 2dX»i for fixed Ху and xpp, . The integral 
Over хрр gives 


оо «+ 1 РН 
| Xpp ^ ерх = T (œ), (а) > 0. 
0 


If we let и = YY’, then from Theorem 2.16 and Remark 2.13 of Mathai (1997) or using 
Theorem 4.2.3, after integrating out over the Stiefel manifold, we have 


p-1 
JT 2 


rey) 
(Note that n in Theorem 2.16 of Mathai (1997) corresponds to p — 1 and p is 1). Then, 
the integral over u gives 


1 р—1 p-l 
p-l 1 a— PH Г( 2 (6 = 2 ) p-1 
2 d = š Bn as 
/ и (1—и) и Fa) (о) > J 


zi 
dY = u^ dy. 


Now, collecting all the factors, we have 


z^ rre- 251) 
rey Г (о) 


= [xO ta FE Pw — (p — 1)/2) 


1 
Ху ra) 


for (0) > —-. Note that bevel is (p — 1) x (p — 1) апа |X|, after the completion 
of ee first <n of the operations, is denoted by |X Ti the exponent being changed to 


at + — ptt . Now repeat the process by separating xp—1,p—1, that is, by writing 


(2) (2) 
xi = E md | ! 
Хэр Хр-1,р-1 
Неге, x is of order (p — 2) x (p — 2) and XT is of order 1 x (p — 2). As before, 


l —p-2 
lingues ИШ eu t sax DOT] 2d ess. Tab Ped The 


p-2 
integral over the Stiefel manifold gives — Zu ^5 -ldu and the factor containing (1 — u) 


r(252) 


А 1_ pti 5 : : 
is (1 — u)**2— 7 , the integral over и yielding 


E 2 
[^ л EE Our 
И Г(о) 
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and that over v = Xp—1,p—1 giving 


1 o+i4 22 pH = 
v 2*2" 72e "р = l'(a), Ræ) > 0. 
0 
The product of these factors is then 
p-2 
IX 7 ra- (р -2/2, Ra) > 2. 


Successive (тшшш carried out by employing the same procedure yield the exponent 
of mas 251 + 252 +... 4 1 = PED and the gamma product, Г(о — 251) (о — 
22) ZEUA a), ihe final result being Г, (о). The result is thus verified. 


5.5a.1. Evaluation of matrix-variate gamma integrals in the complex case 


The matrices and gamma functions belonging to the complex domain will be denoted 
with a tilde. As well, in the complex case, all matrices appearing in the integrals will be 
p x p Hermitian positive definite unless otherwise stated; as an example, for such a matrix 
X, this will be denoted by X > O. The integral of interest is 


Г, (о) = | (де(®)[*—Ре—"“©дў., (5.5а.2) 
X>O 


A standard procedure for evaluating the integral in (5.5a.2) consists of expressing the 
positive definite Hermitian matrix as X = ТТ“ where T is a lower triangular matrix with 
real and positive diagonal elements tj; > 0, j = 1,..., p, where an asterisk indicates the 
conjugate transpose. Then, referring to (Mathai (1997, Theorem 3.7) or Theorem 1.6.7 of 
Chap. 1, the Jacobian is seen to be as follows: 


р 
2 | П Б |а? (5.5а.3) 


and then 
tr(X) = tr(TT*) 
= th eem Tc lE 1р1 


and 


"S 


\det(X)|°-P aX = 2d He ич. 
j=l 
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Now. integrating out over ід for j > k, 


fuel? ge "[ ач) 
| e- lid dt jx = J J e "Ju je dt ji ^ df до = л 
m —oo J —oo 


and 
р(р—1) 

I] EEN eae 

j>k 
As well, 

© oF —2, | 
: | t e Jdtj;2Il'(a—j-c1)98(0)- Ј – 1, 

for j = 1,..., p. Taking the product of all these factors then gives 


р(р—1) 


x ? Г(а)Г(а – 1)---Г(а–-р+1) = Г, (о), &(o) > p— 1, 


and hence the result is verified. 
An alternative method based on partitioned matrix, complex case 


The approach discussed in this section relies on the successive extraction of the diago- 
nal elements of X,a p x p positive definite Hermitian matrix, all of these elements being 


necessarily real and positive, that is, xj; > 0, j = 1,..., p. Let 
e [Xu X 
za | 11 i 
X21 хрр 


where Xu is (p — 1) x (p — 1) and 


|det(X)|4^? = |det(X11)]* P |xpp ХХ Хо] 


and 
tr(X) = tr(X11) + xpp- 
Then, 
TT My gg gari 
НИ у= vic Е 
IXpp — Xz1X4, Хр] P —x ll — Xpp X21X үү Хуу Xi2Xpp | P 
Let 


= L Drg., mc 5 " 2, 
Y аР deta aka, 
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referring to Theorem 1.6a.4 ог Mathai (1997, Theorem 3.2(c)) for fixed xpp and X11. Now, 
the integral over xpp gives 


оо 
— —1 А pp == 1 
/ xe pt(p-DUe TRES = Г(о), So) > 0. 


Letting u = ЎЎ*, аў = u?~? m by applying Theorem 4.2a.3 or Corollaries 4.5.2 
and 4.5.3 of Mathai (1997), and noting that и is real and positive, the integral over и gives 


[wera E u) PD-l du — I'(p = 1) Г(о = (р RS D» 9 (о) -p- 1. 
A Г (о) 


Taking the product, we obtain 


SUD аер x?! P(p-1)F@—(p—1)) 
ne 0 рез Г(о) 
=n?" T (a — (p — 1)) аер)!” 


where Х m stands for Ху after having completed the first set of integrations. In the second 
stage, we extract хь—1,р—1, the first (p — 2) x (p — 2) submatrix being denoted by X m 
and we continue as previously explained to obtain |det(X 9 |°+2—P rP? r (æ — (p —2)). 
Proceeding successively in this manner, we have the exponent of z as (p — 1) + (p 2) + 
---+ 1 = p(p — 1)/2 and the gamma product as F (œ — (p — 1)) (œ — (p — 2)) --- Tr (œ) 
for (0) > p — 1. That is, 


P(p 


n^ 5^ Г(о)Г(а — 1)... (a — (p - 0) = F(a). 


5.5.2. Triangularization of the Wishart matrix in the real case 


Let W ~ W,(m, X), X > О bea p x p matrix having a Wishart distribution with 
m degrees of freedom and parameter matrix X > О, that is, let W have a density of the 
following form for X = I: 


2 
fw (W) = 9m ‚М> О, m= р, (5.5.4) 
E 


and f,(W) = 0 elsewhere. Let us consider the transformation W = TT’ where T is a 
lower triangular matrix with positive diagonal elements. Since W > O, the transformation 
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W = TT’ with the diagonal elements of Т being positive is one-to-one. We have already 
evaluated the associated Jacobian in Theorem 1.6.7, namely, 


р 
dW = 2° П jan (5.5.5) 
j=l 
Under this transformation, 


1 P m D Р mE 
(ON тл ee p n 
21р) у] 


j=l 


= | [1@p?-4] gx 5 Lisi f dT. (5.5.6) 


In view of (5.5.6), it is evident that t;;, j = 1,..., p and the йу, i > j are mutually 
1.2 
independently distributed. The form of the function containing t;;, i > j, is е2", and 


hence the /;;'s for i > j are mutually independently distributed real standard normal 
variables. It is also seen from (5.5.6) that the density of 17, is of the form 


m j-l | 


x2 d 2 
en 


1 
zy els 
pgs 
which is the density of a real chisquare variable having m — (j — 1) degrees of freedom 
for j =1,..., p, where с; is the normalizing constant. Hence, the following result: 


Theorem 5.5.1. Let the real p x p positive definite matrix W have a real Wishart density 
as specified in (5.5.4) and let W = TT' where T = (tij) is a lower triangular matrix 
whose diagonal elements are positive. Then, the non-diagonal elements t;; such that i > j 
are mutually independently distributed as real standard normal variables, the diagonal 
elements Ht j= 1,..., p, are independently distributed as a real chisquare variables 
having m — (j — 1) degrees of freedom for j = 1,..., p, and the " 's and 1; 5 are 
mutually independently distributed. 


Corollary 5.5.1. Let W ~ W,(n, о? I), where o? > 0 isa real scalar quantity. Let W = 
TT' where T — (tij) is a lower triangular matrix whose diagonal elements are positive. 
Then, the t;j's are independently distributed for j = 1,..., p, the цу, i > j, are 
independently distributed, and all t;;’s and t;;’s are mutually independently distributed, 
where 17, /о? has a real chisquare distribution with m — (j — 1) degrees of freedom for 
j= 1,...,р, and tjj, i > j, has a real scalar Gaussian distribution with mean value 


А ; iid "E 
zero and variance o^, that is, tjj МО, o?) for all i > j. 
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5.5a.2. Triangularization of the Wishart matrix in the complex domain 


Let W have the following Wishart density in the complex domain: 


Ў, (0) = m-pe-wW) Ws Q. т> p, (5.5a.4) 


р т 
апа foW) = = (elsewhere, which is denoted W ~ W, (m, I). Consider the transformation 
W — TT* where T is lower triangular whose diagonal elements are real and positive. 


The transformation W = T T* is then one-to-one and its associated Jacobian, as given in 
Theorem 1.6a.7, is the following: 


ТАСА ат. (5.5a.5) 
Then we have 
p 
F(W)dW nes - Dis; lil? 2(p- D41] 45 
MM [Te Ple- Eiss ди maf 
= asl Lo y аек Ys ag (5.52.6) 


In light of (5.52.6), it is clear that all the t;;’s and #;;’s are mutually mide peucently dis- 
tributed where Ё; jə i > j, has a complex standard Gaussian density and б, has a complex 
chisquare density with degrees of freedom m — (j — 1) or areal gamma ‘density with the 
parameters (y = m — (j — 1), В = 1), for j = 1,..., p. Hence, we have the following 
result: 


Theorem 5.5а.1. Let the complex Wishart density be as specified in (5.5a.4), that is, 
W ~ W, (m, I). Consider the transformation W — TT* where T — (f; j) is a lower 
аа matrix in the complex domain whose diagonal elements are real and positive. 
Then, fori > j, the tij’s bas standard Gaussian distributed in the complex domain, that is, 
iis №00, 1), i> j,t "n real gamma distributed with the parameters (a = m — (j — 
1), 8=1)forj=1l, КИ and all the t;;’s and tjj's, i > j, are mutually independently 
distributed. 


Corollary 5.5а.1. Let W ~ (т, c?I) where o? > 0 is a real positive scalar. Let 
T, tjj, lij, i > j, be as defined in Theorem 5.5a.1. Then, "m /о? is a real gamma variable 


with the parameters (a = m — (j — 1), B—1)for j = |, Е Шу? N(0, о?) for all 
i > j, and the tjj's and t;;’s are mutually independently distributed. 
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5.5.3. Samples from a p-variate Gaussian population and the Wishart density 


Let the p x 1 real vector X ; be normally distributed, X; ^ Np(u, X), X > O. Let 


Ху,..., Xn be a simple random sample of size п from this normal population and the 
p xn sample matrix be denoted in bold face lettering as X = (X1, X5,..., Xn) where 
X, = (X1j, X2j, ..:, Xpj). Let the sample mean be X = Х| + ::: + Xn) and the matrix 


of sample means be denoted by the bold face X = (X,..., X). Then, the p x p sample 
sum of products matrix S is given by 


5 = (Х – Х)(Х - X) = (sij), sij = 3 ба = Ge — Xj) 
k=1 


where x, = УЭ Xrk/n, r = 1,..., p, are ће averages on ће components. It has 
already been shown in Sect. 3.5 for instance that the joint density of the sample values 
X1,..., Xn, denoted by L, can be written as 


Lol a 9-3-ay zu. (5.5.7) 
(2л)? ||? 


But (X — X)J = О, Ј = (1,..., D, which implies that the columns of (X — X) are 
linearly related, and hence the elements in (X — X) are not distinct. In light of equa- 
tion (4.5.17), one can write the sample sum of products matrix S in terms of a p x (n — 1) 
matrix #„һ—1 of distinct elements so that 5 = 7„_ 1Z,.. 1: As well, according to Theo- 
rem 3.5.3 of Chap. 3, S and X are independently distributed. The p x n matrix Z is 
obtained through the orthonormal transformation XP = Z, PP’ = I, P'P =I where 
P is n x n. Then dX = dZ, ignoring the sign. Let the last column of P be p,. We can 


specify p, to be al so that Xp, = ./nX. Note that in light of (4.5.17), the deleted 


column іп Z corresponds to ./nX. The following considerations will be helpful to those 
who might need further confirmation of the validity of the above statement. Observe that 
X —X = X(I — B), with B = IjJ' where J is an x 1 vector of unities. Since J — B is 
idempotent and of rank n — 1, the eigenvalues are 1 repeated n — 1 times and a zero. An 
eigenvector, corresponding to the eigenvalue zero, is J normalized or A . Taking this as 


the last column p, of P, we have Xp, = \/nX. Note that the other columns of P, namely 
P1, ---, Pn—1, correspond to the п — 1 orthonormal solutions coming from the equation 
BY — Y where Y is an x 1 non-null vector. Hence we can write dZ = dZ,_1 ^dX. Now, 
integrating out X from (5.5.7), we have 

Diao 5905 dz... SS 7, 17/ (5.5.8) 


п—1? 
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where с is a constant. Since Z,—; contains p(n — 1) distinct real variables, we may apply 
Theorems 4.2.1, 4.2.2 and 4.2.3, and write dZ,. in terms of dS as 


p(n—1) 
2 


is = Iz 


(5.5.9) 


л 
D, (551) 
Then, if the density of S is denoted by f (S), 


f(S)ds = a So ne as 


where сі is a constant. From а real matrix-variate gamma density, we have the normalizing 
constant, thereby the value of су. Hence 


n-l ptl 
S _ 
fGs- ена (5.5.10) 
2737 || Ty (454) 


for S > О, X > О, n—1 > pand f(X) = 0 elsewhere, Г,(:) being the real matrix- 
variate gamma given by 


р(р—1) 
Dy(a)-—-z 4 


-- Tr (œ — (p — 1)/2), # (о) > (p — 1)/2. 


Usually the sample size is taken as N so that N — 1 = n the number of degrees of freedom 
associated with the Wishart density in (5.5.10). Since we have taken the sample size as n, 
the number of degrees of freedom is n — 1 and the parameter matrix is X > O. Then S 
in (5.5.10) is written as 5 ~ Wy (m, X), with m = n — 1 > p. Thus, the following result: 


Theorem 5.5.2. Let Xj,...,X, be a simple random sample of size n from a 
№(и, X), X > O. Let Xj, X, X, X, S be as defined in Sect. 5.5.3. Then, the den- 
sity of S is a real Wishart density with m — n — 1 degrees of freedom and parameter 


matrix X > О, as given in (5.5.10). 
5.5a.3. Sample from a complex Gaussian population and the Wishart density 


Let X g^ Noli, 5), > Se 0 у 0.4 ‚п Бе independently distributed. Let X= = 


(1,..., X), X = LOG +--- Х,), X = (Х,...,Х) and let § = (X – X)(X – X)* 
where a * indicates the conjugate transpose. We have already shown in Sect. 3.5a that the 
joint density of Х PEN mo denoted by L, can be written as 

L= — — ES) Kf EX fd), (5.5a.7) 


— л"Р|де{(37)|" 
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Then, following steps parallel to (5.5.7) to (5.5.10), we obtain the density of $, denoted by 
/ (5), as the following: 


1 ~ 
S)d$ = - S d$, т=п-1> р, 5.5а.8 
FOS) |det( X)|” P, (m) = ( ) 


for $ > О, X > О, n—1 > p, and f (S) = 0 elsewhere, where the complex matrix- 
variate gamma function being given by 


pip 


Pia) = x Го) Го — 1)--- (a — p4 1), Ræ) > p — 1. 


Hence, we have the following result: 


Theorem 5.5a.2. Let x ~ Кь(и, У), X > О, j = 1,...,n, be independently and 


identically distributed. Let X, X, X, Sbeas previously defined. Then, S has a complex 
matrix-variate Wishart density with т = п — 1 degrees of freedom and parameter matrix 
У > О, as given in (5.5a.6). 


5.5.4. Some properties of the Wishart distribution, real case 


If we have statistically independently distributed Wishart matrices with the same pa- 
rameter matrix 27, then it is easy to see that the sum is again a Wishart matrix. This can 
be noted by considering the Laplace transform of matrix-variate random variables dis- 
cussed in Sect. 5.2. If S; ~ (ту, X), j = 1,..., К, with the same parameter matrix 
У > О and the S;’s are statistically independently distributed, then from equation (5.2.6), 
the Laplace transform of the density of S; is 


Ls,GT) = [Aer Pis Ped 25 (5.5.11) 


where „Т is a symmetric parameter matrix Т = (tij) = T’ > О with off-diagonal ele- 
ments weighted by І. When 5;'s are independently distributed, then the Laplace transform 
of the sum 5 = S1 + --- + Si is the product of the Laplace transforms: 


k | 
[[и+2®„тү т = I2, T| 20100 2 S~ рр (ту. +m, E). (5.5.12) 
j=l 


Hence, the following result: 
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Theorems 5.5.3, 5.5a.3. Let 5; ~ (ту, У), X > О, j = 1,...,k, be statisti- 
cally independently distributed real Wishart matrices with mı, ..., my degrees of free- 
doms and the same parameter matrix X > О. Then the sum S = Sı + --- + Sx is real 
Wishart distributed with degrees of freedom m, + --- + my and the same parameter ma- 
trix X > О, that is, S ~ М(т +--+ + mg, X), X > О. In the complex case, let 
S; ~ № (т, X) X = X* > О, ј =1,...,k be independently distributed with the 
same X. Then, the sum § = 51 Ter Si ~ МУ (ті Terme, 5). 


We now consider linear functions of independent Wishart matrices. Let 5; ~ 
Wp(mj, 2), X > О, j =1,...k, be independently distributed and Sg = a1 Sı +--+ + 


ay Sy where а, ..., ак are real scalar constants, then the Laplace transform of the density 
of S, 1s 
k m; 
J 
Ls GT) = [[U + 2a; ETIE, 1 À2ajZ,T > О, j=1,...,k. (i) 
j=l 


The inverse is quite complicated and the corresponding density cannot be easily deter- 
mined; moreover, the density is not a Wishart density unless a, = --- = ак. The types 
of complications occurring can be apprehended from the real scalar case р = 1 which is 
discussed in Mathai and Provost (1992). Instead of real scalars, we can also consider p x p 
constant matrices as coefficients, in which case the inversion of the Laplace transform will 
be more complicated. We can also consider Wishart matrices with different parameter ma- 
trices. Let U; ~ Wy(m;, Zj), X; > О, ] =1,...,k, be independently distributed and 
U = Ui +- - -+ Ug. Then, the Laplace transform of the density of U, denoted by Ly (T), 
is the following: 


k | 
1067) = [и ОРО о с МО. (ii) 
j=l 


This case does not yield a Wishart density as an inverse Laplace transform either, unless 
X, = ... = Ур. In both (i) and (ii), we have linear functions of independent Wishart 
matrices; however, these linear functions do not have Wishart distributions. 

Let us consider a symmetric transformation on a Wishart matrix S. Let 5 ~ 
Wy(m, E), X > O and = ASA’ where A is a p X p nonsingular constant matrix. 
Let us take the Laplace transform of the density of U: 
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ЕСТЕ ы Bile ee) ee АА 


= Efe ЧАТА — p C(A' TA) = |I E2Z(A TAL 2 
= |I 4-2(AEAP,T|^? 
= U ~ W,(m, АХА), X > О, |A| z 0. 
Hence we have the following result: 


Theorems 5.5.4, 5.5a.4. Let 5 ~ Wy(m, X > О) andU = ASA’, |A| Z 0. Then, О ~ 
W,(m, AXA’), У > О, |A| z 0, that is, when U = ASA’ where A is a nonsingular 
p х p constant matrix, then О is Wishart distributed with degrees of freedom m and 
parameter matrix AXA’. In the complex case, the constant рх T nonsingular matrix 
A can be real or in the complex domain. Let 5 ~ W, (m, X), = X* > О. Then 
Ü — ASA* — № (т, AX A*). 


If A is not a nonsingular matrix, is there a corresponding result? Let B be a constant 
q хр matrix, q X p, whichis of full rank q. Let X; ~ №, (и, X), X > О, j—l...,n 
be iid so that we have a simple random sample of size n from a real p-variate Gaussian 
population. Let the q x 1 vectors Y; = ВХ;, j = l,...,n, be iid. Then E[Y;] = 
Ви, Cov(Yj) = ELY; – EY) (Y; -E(Y)] = BEX; — EGG) (X; - EX) B' = 
B&B’ which is q x q. As well, Y; ~ №(Ви, BEB, ВУВ > О. Consider the 
sample matrix formed from the Y;'s, namely ће д x n matrix Y = (Yj,...,Y,) = 
(ВХ,..., ВХ,) = B(Xi1,..., Xn) = BX where X is the p х n sample matrix from 
X j. Then, the sample sum of products matrix in Y is (Y — Ү)(Ү – Ү) = Sy, Say, where 
the usual notation is utilized, namely, Y= 1Y +--+ YQ and Y = (Y,..., Y). Now, 
the problem is equivalent to taking a simple random sample of size n from a q-variate real 
Gaussian population with mean value vector Bu and positive definite covariance matrix 
BXB' > O. Hence, the following result: 


Theorems 5.5.5, 5.5a.5. Let Xj; ~ Np(u, X), У > О, j = 1,...,n, be iid, and S 
be the sample sum of products matrix in this p-variate real Gaussian population. Let B 
be aq х p constant matrix, q < p, which has full rank q. Then BSB’ is real Wishart 
distributed with degrees of freedom m = n — 1, n being equal to the sample size, and 
parameter matrix B X B' > O, that is, BSB ~ W,(m, B X B^). Similarly, in the complex 
case, let B beaq x p, q < p, constant matrix of full rank q, where B may be in the real 
or complex domain. Then, B S B* is Wishart distributed with degrees of freedom m and 
parameter matrix B X В“, that is, BSB* ~ Wa (m, B X B*). 


5.5.5. The generalized variance 


Let X у, X, = (xij,..., Xpj), be areal р x 1 vector random variable for j = 1,...,n, 
and the X ;'s be iid (independently and identically distributed) as X ;. Let the covariance 
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matrix associated with X; be Cov(X у) = E[(X; — E(X;))(X; -E(Xj))] = X, X > О, 
for j = l,...,n in the real case and 5 = E[(Xj — E(X pŠ; — E(X;))*] in the 
complex case, where an asterisk indicates the conjugate transpose. Then, the diagonal 
elements in 27 represent the squares of a measure of scatter or variances associated with 
the elements x1;,..., xp; and the off-diagonal elements in X provide the corresponding 
measure of joint dispersion or joint scatter in the pair (хуу, Ху) for all r Æ s. Thus, X gives 
a configuration of individual and joint squared scatter in all the elements хі ;,..., xpj. If 
we wish to have a single number or single scalar quantity representing this configuration 
of individual and joint scatter in the elements хү ;,..., хр; what should be that measure? 
Wilks had taken the determinant of 27, |X|, as that measure and called it the generalized 
variance or square of the scatter representing the whole configuration of scatter in all 
the elements хү у, ..., xpj. If there is no scatter in one or in a few elements but there is 
scatter or dispersion in all other elements, then the determinant is zero. If the matrix is 
singular then the determinant is zero, but this does not mean that there is no scatter in 
these elements. Thus, determinant as a measure of scatter or dispersion, violates a very 
basic condition that if the proposed measure is zero then there should not be any scatter 
in any of the elements or 27 should be a null matrix. Hence, the first author suggested to 
take a norm of 27, || ||, as a single measure of scatter in the whole configuration, such 
as | Z||j = max; >> j |o;;| or | Z||? = largest eigenvalue of X since X is at least positive 
semi-definite. Note that normality is not assumed in the above discussion. 

If S ~ Wp(m, X), X > О, what is then the distribution of Wilks’ generalized vari- 
ance in S, namely |S|, which can be referred to as the sample generalized variance? Let 
us determine the h-th moment of the sample generalized variance |5| for an arbitrary Л. 
This has already been discussed for real and complex matrix-variate gamma distributions 
in Sect. 5.4.1 and can be obtained from the normalizing constant in the Wishart density: 


S А P$ „—}\г(5 15) 
E[|SI^] = Љо! А = ds 
27 PISI 
Dy +h — 1 
Н т ж р ТЕ ы (5.5.13) 
D») 2 2 
Then 
Res) ере 
DO ja rG- 
= EDHEDA]: -: Ely] (5.5.14) 
where у, ··· , Yp are independently distributed real scalar gamma random variables with 


the parameters (9 — E 1), j— 1,..., p. In the complex case 
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E[|det((Z) ! $)]^] = 


Pn +h) Чу ж ы, 


Fm) 5, Pm-G-0) 


= E[SE ED, (5.52.9) 


where ут, ..., yp and independently distributed real scalar gamma random variables with 
the parameters (m — (j — 1), 1), j = 1,..., p. Note that if we consider E[|Z-!S|^] 
instead of Е[|(2 X)! S|^] in (5.5.14), then the yj's are independently distributed as real 
chisquare random variables having m — (j — 1) degrees of freedom for j = 1,..., p. This 
can be stated as a result. 


Theorems 5.5.6, 5.5a.6. Let S ~ Wy(m, X), X > О, and |S] be the generalized vari- 
ance associated with this Wishart matrix or the sample generalized variance in the cor- 
responding p-variate real Gaussian population. Then, E(|(2.2)-! S|^] = E[y?] ee E[y2] 
so that |(22)-! S| has the structural representation |QX)-!S| = у: Ур Where the 
yj's are independently distributed real gamma random variables with the parameters 
(ж — 4 1), j = 1,..., p. Equivalently, E(|E- S|^] = E[z^]--- E[zp]^] where 
the z;’s are independently distributed real chisquare random variables having m — (j — 
1), j = 1,..., p, degrees of freedom. In the complex case, if we let 5 ~ У (т, УХ), Ў = 
X* > О, and |det(S)| be the generalized variance, then | det((3)- $)| has the structural 
representation |det((3)-!$)| = y--- yp where the y;’s are independently distributed 
real scalar gamma random variables with the parameters (m — (j —1), 1), j=1,...,p 
or chisquare random variables in the complex domain having m — (j —1), j = 1,..., р, 
degrees of freedom. 


5.5.6. Inverse Wishart distribution 


When 5 ~ W,(m, X), X > О, what is then the distribution of $-1? Since S has 
a real matrix-variate gamma distribution, that of its inverse is directly available from the 
transformation U = S~!. In light of Theorem 1.6.6, we have dS = \U|-?+) dU for the 
real case and dX = |det(UU*)|~?dU in the complex domain. Thus, denoting the density 
of U by g(U), we have the following result: 


Theorems 5.5.7, 5.5a.7. Геї the real Wishart matrix S ~ Wy(m, X), X > О, and the 
Wishart matrix in the complex domain S ~ W,(m, X), X = X* > O. Let U = S-land 
О = S—!. Letting the density of S be denoted by g(U) and that of U be denoted by g(U), 


IU 3273 
g(U) = mp 


was || ee (5.5.15) 
2 px 2 
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and zero elsewhere, and 


А det(Ü)|-"-P саа uo xu "E 
= Г E (Sl e "CUUDDIU*-o,E-X' > О, (5.5а10) 
р т et m 


and zero elsewhere. 
5.5.7. Marginal distributions of a Wishart matrix 


At the beginning of this chapter, we had explicitly evaluated real and complex matrix- 
variate gamma integrals and determined that the diagonal blocks are again real and com- 
plex matrix-variate gamma integrals. Hence, the following results are already available 
from the discussion on the matrix-variate gamma distribution. We will now establish the 
results via Laplace transforms. Let S be Wishart distributed with degrees of freedom m 
and parameter matrix X > O, that is, 5 ~ W,(m, X), X > О, т = p. Let us partition S 


and X as follows: 
$11 Si2 Au 2X2 А 
5 = and X = І 
Es | Е 2 © 


(referred to as a 2x2 partitioning) 511, X11 being r xr and 522, 2722 being (p—r) x (p—r) 
— refer to Sect. 1.3 for results on partitioned matrices. Let „Т be a similarly partitioned 
p х p parameter matrix with „Т being r x r where 


T O 2 
„Т = | P oh «Тр = «T3 > О. (ii) 


Observe that „Т is a slightly modified parameter matrix T = (1;;) = T' where the /; j S are 
weighted with 1 for i # j to obtain „Т. Noting that tr(, T'S) = tr(,T,511), the Laplace 
transform of the Wishart density W,(m, 27), X > О, with „Т as defined above, is given 
by 


m 


T,+2214T 1, О 


= [f+ 2D eT ul ?. 5.5.16 
221471) 1 Mr usT ul ( ) 


1-++2%5„Т| 3 = 


Thus, 511 has a Wishart distribution with m degrees of freedom and parameter matrix 27. 
It can be similarly established that 522 is Wishart distributed with degrees of freedom m 
and parameter matrix 2752. Hence, the following result: 


Theorems 5.5.8, 5.5a.8. Let S ~ Wy(m, X), X > О. Let S and X be partitioned into 
a2 x 2 partitioning as above. Then, the sub-matrices S11 ~ W,(m, X11), Mu > O, 
and 52 ~ Wp-r(m, X22), 222 > О. Іп the complex case, let S ~ Wp(m, X), X = 
b» > O. Letting S be partitioned as in the real case, Su ~ W,(m, X) and $22 ~ 
Wp-r (m, 272). 
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Corollaries 5.5.2, 5.5a.2. Let S ~ W,(m, X), X > О. Suppose that X1? = О in 
the 2 x 2 partitioning of X. Then 51 and Sn are independently distributed with 5 ~ 
W,(m, X11) and S2 ~ Wy, (m, 222). Consider a k x К partitioning of S and X, the 
order of the diagonal blocks S;; and Xj; being pj x pj, р +: + px = p. If Xij = О 
for all i # j, then the Sj;'s are independently distributed as Wishart matrices on pj 
components, with degrees of freedom m and parameter matrices Xj; > О, j =1,...,k. 
In the complex case, consider the same type of partitioning as in the real case. Then, if 
Mj = = 6; ) for alli # j, Sjj, j = 1,...,k, are independently distributed as Sij ~ 


Wp; (m, Xp. jk. ‚К, prs ILE 


Let S be a p x p real Wishart matrix with m degrees of freedom and parameter matrix 
X > О. Consider the following 2 x 2 partitioning of S and X^ !: 


_ E $12 yl 52 | 


1 
$1 a Si; beingr xr, 277 = [2а E 


Then, the density, denoted by f (5), can be written as 


fS) ql eines 
mp m 
e i 
m. —pti 
151112 "T 1522 = Sa Sit $127 2 
"ENMCPIE 
x е 3l Зи) (2250) Hr (2 5р) tr? 81)]. 


In this case, dS = 4511 A 1522 ^ 152. Let U2 = S22 — 5187 512. Referring to Sect. 1.3, 
the coefficient of 51 in the exponent is X! = (i Жаз Da Let U2 = $5, — 


$2181, S12 so that $55 = U2 + $2151, 512 апа 4525 = dU» for fixed 511 and S12. Then, the 
function of U2 is of the form 


22 
|U3|2- 3 e- = U2) 


However, U2 is (p—r) x (p— i and we can write m pil = т PIEL Therefore Uz ~ 

Wp- (m — r, Ex — Xi Z1 Хо) as X? = (Xx — Yq Xj Ул) |. From symmetry, 
Ui = 51 — $1285) S21 ^ W.(m—(p—r), Xi1— mc After replacing 522 in the 
exponent by U2 + 5215 1 $12, the exponent, excluding —5, can be written as tr[ X} 11 S11] + 


tr[ x2? $1255; S21] + tr[ X 2554] + tr[ X?! S15]. Let us @ to integrate out 512. To this end, 
let V = 5? $15 => 1512 = Suk "> dV for fixed S11. Then the determinant of S1; in f (X) 
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m. Р+І por т _г+1 : 1 
becomes |51112 2 x [Su] ? = |511|2 ? . The exponent, excluding — becomes the 


following, denoting it by p: 
1 1 
p = (X V'S) + (2152 у) + u(z? v'v). (i) 
Note that tr(Z?? V'V) = (У Z? V^) and 


(Уу +С)? Vy eV WV eV EWC PCS VCS, (ii) 


On comparing (i) and (ii), we have С’ = (22)-1 521.52 Substituting for C and C’ in 
p, the term containing S11 in the exponent becomes —jt(S$1 (2! = Cy) Yes 
-4 (51127 a) Collecting the factors containing S11, we have 51 ^ W,(m, X11) and 
from symmetry, 522 ~ Wp-r(m, 22). Since the density f (S) splits into a function of 


LAE 
U2, S11 and S its S12, these quantities are independently distributed. Similarly, U1, S22 and 


2% 
5,2 S21 are independently distributed. The exponent of |071 | is 7 — 2 H = (= у= =! ; 
Observing that U1 is r х r, we have the density of U; = S11 — $555 S21 as a real Wishart 
density on r components, with degrees of freedom m — (p — r) and parameter matrix 


>) 1225 2», whose the density, denoted by f1(U), is the following: 


m-(p-r) r4l 
2 


fi QU) = Ко. e stl Gui Eo Ej En) !] 
IBS а). m—(p—r) туу тр) 
2 2 05) 111 — 2125) Zl? 


(iii) 


A similar expression can be obtained for the density of U2. Thus, the following result: 


Theorems 5.5.9, 5.5a.9. Let 5 ~ W,(m, X), X > О, m = p. Consider the 2 x 2 
partitioning of S as specified above, 51 being r x r. Let Ui = Sy, — 51255! S21. Then, 


Ui ~ W,(m —(p—r), Diu — Eng Zo). (5.5.17) 


In the complex case, let S~ УУ (т, У), X = X* > О. Consider the same partitioning 
as in the real case and let Su ber xr. Then, letting Ü, = Su — $$, $51, 2, is Wishart 
distributed as 

Ü, ~ „(т — (p —r), Xu cs X12 35 D1). (5.5a.11) 


A similar density is obtained for 0 = Soo — So Sy S15. 
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Example 5.5.1. Let the 3 x 3 matrix 5 ~ W3(5, 27), X > О. Determine the distribu- 
tions of Yj = S22 — 823151, $15, Yo = So2 апа Үз = $1, where 


2 —1 0 
Ma Xp 2 —1 $1 812 
Sessa: РЕ EE xe 
0 1 3 E | s Е J | 


with $1; being 2 x 2. 


Solution 5.5.1. Let the densities of Y; be denoted by f;(Y;), j = 1, 2, 3. We need the 
following matrix, denoted by B: 


- 17/3 110 
B= X22 TA Dy Бү Xo =3= [0, JH l | H 


5 
2. [3 
= 3 — — = —, 
5 5 


From our usual notations, Y; ~ Мр (т — r, B). Observing that Y, is a real scalar, we 
denote it by yı, its density being given by 


m—r__ (р=ғ)+1 
2 


2 


fip) = 4 e72 (871) 
2 


("70-р | В | mot Г ( LE r 


2T 
— ч жа 0 € у < оо, 
2203/53 Г (2) 
and zero elsewhere. Now, consider Y2 which is also a real scalar that will be denoted by y». 
As per our notation, Y? = S22 ~ W, (m, X22). Its density is then as follows, observing 
that X» = (3), |Z22] = 3 and Z5! = (1): 


m. (р—т)+1 
2 


БО) = » 


m(p—r) 


e 200» 


Гр 0123]? 
and zero elsewhere. Note that Y3 = S11 is 2 х 2. With our usual notations, р = 3, r = 
2, т = 5 and |5^||| = 5; as well, 


o [su s12 sj M p.n zd 
Si = E 2] ‚лу = 5 l jJ tr(2 $11) = 5 (3511 + 2512 + 2522]. 
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Thus, the density of Y3 is 


+1 
|S [27 F e72 "071 S11) 


22 Г,(@)|511|? 
[511522 = 52е 185125122522) 


z ‚ з > О, 
(3)(23)(5)2л 


апа zero elsewhere. This completes ће calculations. 


Example 5.5a.1. Let the 3 x 3 Hermitian positive definite matrix S have a complex 
Wishart density with degrees of pedo т = 5 and parameter matrix X > О. Determine 
the densities of Ý} = $5 — $451 $15, Yo = $55 and Үз = = $1 where 


~ 3 -i 0 
К $n $12 EN 2o 202] __ |. ; 
S. — ~ |, Y= == [Б 2 у 
Sor 802 59] Xn 0 -i 2 
with S11 and 27 being 2 x 2. 


Solution 5.5a.1. Observe that 27 is Hermitian positive definite. We need the following 
numerical results: 


" afl 2 i 0 3 7 
B= 22 Ез Ез 2-0-0] : id 2-2-5 


5 7 I[2 i 
ees = — -1 _ L 
В = 5, |В| = =, Eg ik i 


Note that Y 1 and Yo are real scalar quantities which will be denoted as y; and y», respec- 
tively. Let the densities of y; and y? be f;(y;), j = 1, 2. Then, with our usual notations, 


Л Ол) is 


|дег(ў)| —7—Ф—Эе—(В-13) 
|det(B)|"7" Г, (m — r) 


fiO) = 


5 
у? e 7 


= ———_.,, 0< < oo, 
dera ^" 
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and zero elsewhere, and the density of ya, is 


|det( so) |t - (97 et Ep 2) 

hO) = — <A 
[det(222)|" Tp- (m) 

у go 

© 25Г(5) 


0< уз < оо, 
and zero elsewhere. Note that Y3 = $ 11 is 2 x 2. Letting 
" $ е 1 m " 
811 = E M ; (27 811) = —[2511 + 3522 + iS}, — i512] 
522 5 
апа Idet(Y3)] = [811522 — 515512]. With our usual notations, the density of Ys, denoted by 
fs (Ya), is the following: 


ldet(£:)]n-7e- tC 99 
| det (Zi) |" Г, (т) 
3e- 512811+3522--18—1812] 


ВО) = 


_ [511522 — 812510] 
55 ГУ (5) 


‚з> 0, 


and zero elsewhere, where 55 Г» (5) = 3125(144)л. This completes the computations. 


5.5.8. Connections to geometrical probability problems 


Consider the representation of the Wishart matrix 5 = 4, i£. | given in (5.5.8) 


where the p rows are linearly independent 1 x (n — 1) vectors. Then, these p linearly 
independent rows, taken in order, form a convex hull and determine a p-parallelotope in 
that hull, which is determined by the p points in the (n — 1)-dimensional Euclidean space, 
n — 1 > p. Then, as explained in Mathai (1999), the volume content of this parallelotope 
is v = |2,12, |2 = |S|2, where 5 ~ W,(n — 1, X), X > О. Thus, the volume 
content of this parallelotope is the positive square root of the generalized variance |S]. The 
distributions of this random volume when the p random points are uniformly, type-1 beta, 
type-2 beta and gamma distributed are provided in Chap. 4 of Mathai (1999). 


5.6. The Distribution of the Sample Correlation Coefficient 


Consider the real Wishart density or matrix-variate gamma in (5.5.10) for p — 2. For 
convenience, let us take the degrees of freedom parameter n — 1 = m. Then for p = 2, 
f (5) in (5.5.10), denoted by f2(S), is the following, observing that |$| = 511522(1 — r?) 
where r is the sample correlation coefficient: 
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m 3 1 
[s11522(1 — 72)]273e-2 (E79) 
f2(S) = fair, 522, r) = ÉD (5.6.1) 
2" [11025 (1 — p^)]2 (5) 


where o = the population correlation coefficient, |X| = 011022 — от» = 011022(1 — р?), 
D) =хл?їГ(®)Г(®1), -1<p<1, 


1 ЕС 
D! = —Cof(Z) = Wises а 
[2| 011022(1 — o^) | 012. ?11 
1 p 
1 == "eT 
Б а le ur ad ‚ 012 = p4/011022, (i) 
F 0 МО11022 022 
2 1 $11 512 522 
(XLS) = [E - 29 ү | 
1 — р? (оц /011022 022 
1 Ку S115 5 
= ври 11822 | zl (ii 
1 — p4 (011 011022 022 
Let us make the substitution x; = 8 хә = А Note that dS = dsj; ^ dso2 A о. 


But dsı2 = ./s11522 dr for fixed 51 and s22. In order to obtain the density of r, we must 
integrate out x; and x2, observing that ./s511522 is coming from 512: 


1 
00290 оа pr лі rre) 
31-5517 THERE UM A 


J fa(S) 4511 A ds22 = 
511.522 


m—3 mo BET 


For convenience, let us expand 


k k 

sh res  NA( го MU 
as E (iv) 

k=0 0 
Then the part containing x; gives the integral 
d = Кы а] т т k 
| к HOS 20—p2) dx; = [2(1 — PrE + 5). т > 2. (у) 
X= 


By symmetry, the integral over x2 gives [2(1 — о2)]2+ Г (4), m > 2. Collecting all 
the constants we have 


Gon) haere) “lage Se) 


m m 1 = 4 (vi) 
2” (011022)? (1 — р2) Zr? rr (254) wer (2) P(e) 
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We can simplify l'7)DP65— 3) by using the duplication formula for gamma functions, 
namely 


1 —1 
PQ2) =r rr (z+ 2): z= TO (5.6.2) 
Then, 
1 
m m— 1 I'(m — 1)л? с 
уго) - Pet và 


Hence the density of r, denoted by f; (7), is the following: 


2m-2 m 247 oo 2 k k 
О TELE — yt y ETE (TER, -1 er <1, 663) 
I'(m — 1)x € k! 2 
and zero elsewhere, m = n — 1, n being the sample size. 
5.6.1. The special case p = 0 
In this case, (5.6.3) becomes 
2"—?Г?(т) т—1 
2 2, m1] 
(п) = ———‘““(1 — 2^, -l<r<l,m=n-1 5.6.4 
ке run r<l,m=n (5.6.4) 
Г Е m— 
v (1 = rr = 1 = r < 1, (5.6.5) 
Jar) 


zero elsewhere, т = n — 1 > 2, n being the sample size. The simplification is made 
by using the duplication formula and writing l (m — 1) = gj 237-0 panel gens For 
testing the hypothesis Н, : р = 0, the test statistic is r and the null distribution, that 
is, the distribution under the null hypothesis H, is given in (5.6.5). Numerical tables of 
percentage points obtained from (5.6.5) are available. If o 5 0, the non-null distribution 15 
available from (5.6.3); so, if we wish to test the hypothesis Ho : р = po where po is a given 
quantity, we can compute the percentage points from (5.6.3). It can be shown from (5.6.5) 
that for р = 0, tn = ./m— is distributed as a Student-t with m degrees of freedom, 


A/ 1-r? 


and hence for testing Ho : p = 0 against Hj : р 0, the null hypothesis can be rejected 
if |in| = Jm| Т =| > tm,% Where Pr{|tm| > 1,9} = o. For tests that make use of the 
—r 


Student-t statistic, refer to Mathai and Haubold (2017b). Since the density given in (5.6.5) 
is an even function, when o = О, all odd order moments are equal to zero and the even 
order moments can easily be evaluated from type-1 beta integrals. 
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5.6.2. The multiple and partial correlation coefficients 


Let the p x 1 real vector X; with X j = (x1j,...,xpj) have a p-variate distribution 
whose mean value E(X ;) = и and covariance matrix Cov(X;) = X, X > О, where џи is 
p x land X is p x p,for j = l,...,n, the Xj's being iid (independently and identically 
distributed). Consider the following partitioning of X: 


ys E 2 оп > 0is1 x 1, o3 > Ois(p— 1) x (p- 1), Ej = Ej. 


Xn 222 
Let 
X125] Xn 
2 22 
p I анаа (5.6.6) 
1.(2...p) E 
Then, (1.(2...p) is called the multiple correlation coefficient of xij on X2j,...,Xpj. The 


sample value corresponding to p? (2...p) which is denoted by ЁГ (2...p) and referred to as the 
square of the sample multiple correlation coefficient, is given by 


51285181. su S 
2 = 22 _ | sir Si2 
ngu s with 5 б ed (5.6.7) 


where 511 is 1x1, Sy. is (p—1)x(p—1), S = (K—X)(K—X)’, X = (Xi,..., Xn) is the 
p xn sample matrix, n being the sample size, Х = Ho +.---+X,), X= (X, es X) is 
p xn, ће Ху, j =1,...,n, being iid according to a given p-variate population having 
mean value vector u and covariance matrix X > О, which need not be Gaussian. 


5.6.3. Different derivations of p; (5. 


Consider a prediction problem involving real scalar variables where x, is predicted by 
making use of x2, ..., хр or linear functions thereof. Let А» = (а2,...,ар) be a constant 
vector where aj, j = 2,..., p are real scalar constants. Letting Xo) = (X2,..., Xp) a 
linear function of Х (2) is u = A,X о) = a2X2 +++++4pXp. Then, the mean value and vari- 
ance of this linear function are E[u] = E[A5X,2)] = A54 and Var(u) = Var(A5X(5) = 
A5 222 A» where ко, = (H2,.... мр) = Е[Х(оу] and Xz is the covariance matrix asso- 
ciated with Хо), which is available from the partitioning of X specified in the previous 
subsection. Let us determine the correlation between x1, the variable being predicted, and 
и, a linear function of the variables being utilized to predict х, denoted by p1,u, that is, 


Cov(x4, и) 


JNar(xi)Var(u) 


Plu = 
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where Cov(x1, u) = E[Gi — E(xi)) (u— E(u))] = Eli -EEDA D-E Xo) А2] = 
Cov(xi, X(2))A2 = X242, Var(x1) = олі, Var(u) = А 22242 > e Letting х5? be 

the positive definite square root of X22, we can write 212A» = (272 у? БА) Then, 
on applying Cauchy-Schwartz’ inequality, we may write 2/2 А2 = (X ux. ) (Zi A) = 
К Gin Zl 221) (A5 222 A2). Thus, 


РЕР 232242) ухо 221 


Plu © , that is, 
V (011) (Ay 222 Аз) уе 
X35] Xn 
2 22 2 
< oe _— . 5.6.8 
01. Š m £1 Q...p) ( ) 
This establishes the following result: 
Theorem 5.6.1. The multiple correlation coefficient pi.(2... p) of X1 on X2, ..., xp repre- 
sents the maximum correlation between x, and an arbitrary linear function of x», . . . , Xp. 
This shows that if we consider the joint variation of x; and (x2, . . . , хр), this scale-free 


joint variation, namely the correlation, is maximum when the scale-free covariance, which 
constitutes a scale-free measure of joint variation, is the multiple correlation coefficient. 
Correlation measures a scale-free joint scatter in the variables involved, in this case x; and 
(x2, ..., xp). Correlation does not measure general relationships between the variables; 
counterexamples are provided in Mathai and Haubold (2017b). Hence “maximum corre- 
lation" should be interpreted as maximum joint scale-free variation or joint scatter in the 
variables. 

For the next property, we will use the following two basic results on conditional ex- 
pectations, referring also to Mathai and Haubold (2017b). Let x and y be two real scalar 
random variables having a joint distribution. Then, 


Ely] = Е[Е(у|х)] © 


whenever the expected values exist, where the inside expectation is taken in the conditional 
space of y, given x, for all x, that is, Ey),(y|x), and the outside expectation is taken in the 
marginal space of x, that is E, (x). The other result states that 


Var(y) = Маг(Е[у|х]) + E[Var(y|x)] (i) 


where it is assumed that the expected value of the conditional variance and the 
variance of the conditional expectation exist. Situations where the results stated in 
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(i) and (ii) are applicable or not applicable are described and illustrated in Mathai 
and Haubold (2017b). In result (i), x can be a scalar, vector or matrix variable. 
Now, let us examine the problem of predicting x; on the basis of x2, ..., хр. What is 
the “best” predictor function of x2, ..., Xp for predicting x1, “best” being construed as in 
the minimum mean square sense. If (x2, ..., Xp) is an arbitrary predictor, then at given 
values of x2, ..., xy, ф isa constant. Consider the squared distance (x; — by? between х1 
and b = $ (x2, ..., Xp|x2, ..., Xp) or b is ф at given values of x2, ..., xp. Then, “mini- 
mum in the mean square sense" means to minimize the expected value of (x1 — b)? over 
all b or min E(x — Б)2. We have already established in Mathai and Haubold (2017b) that 


the minimizing value of b is b = E[x1] at given x2, ..., хр or the conditional expecta- 
tion of х], given x2,..., Xp or b = E[x1|x2, ... , Xp]. Hence, this "best" predictor is also 
called the regression of x on (x2, ..., xp) or E[x1|x2, ... , xp] = the regression of xı on 
X2, ..., Xp, Or the best predictor of x; based on x2,..., хр. Note that, in general, for any 


scalar variable y and a constant a, 


Е[у — а] = Ely — E(y) + E(y) – a? = E[(y — EQ) — 2Е[(у — EOE) — a)] 
+ EKE) — a] = Var(y) + 04 [E(y) — а]. (iii) 


As the only term on the right-hand side containing a is [E(y) — a]?, the minimum is 
attained when this term is zero since it is a non-negative constant, zero occurring when 
a — E[y]. Thus, E[y — a]? is minimized when a = E[y]. If a = ф(Хо)) at given value 
of X2), then the best predictor of xı, based on Хү) is E[x1| X o)] or the regression of 
xı on X». Let us determine what happens when Е[х1| Х(әу] is a linear function in Хү). 
Let the linear function be bo + box? +--+ + bpXp = bo + В Хо), B5 = (bo, .... bp), 
where bo, b2, ..., bp are real constants [Note that only real variables and real constants 
are considered in this section]. That is, for some constant bo, 


E[x1|X 2)] = bo + b2x2 + +++ + bpxp. (iv) 


Taking expectation with respect to x1, x2,..., хр in (iv), it follows from (i) that the left- 
hand side becomes Е [x], the right side being bo + b? E[x2] + - - - + bp E[xp]; subtracting 
this from (iv), we have 


E[xilXo] — Elx] = (о — Elx) +--+ + bp(xp — Elxp)). (v) 


Multiplying both sides of (v) by x; — E[x;] and taking expectations throughout, the 
right-hand side becomes b202; +--+ + bpopj where oj; = Cov(xi, xj), i A j, and it 
is the variance of x; when і = j. The left-hand side is E[(x; — E(xj))CE[xilxo)] — 
E(x))] = ELE@ixj|X@q)] — EG) EQa) = Е[хуху] — EG) ECs) = Cov, xj). 
Three properties were utilized in the derivation, namely (i), the fact that Cov(u, v) = 
E[(u — E(u))(v — E(v))] = E[u(v — Е(0))] = E[v(u — E(u))] and Cov(u, v) = 
E(uv) — E(u) E(v). As well, Var(u) = E[u — E(u)|* = E[u(u — E(u))] as long as the 
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second order moments exist. Thus, we have the following by combining all the linear 
equations for j = 2,..., p: 


Xy = Хор > b= Dy Xn orb! = Уууу (5.6.9) 


when 2722 is nonsingular, which is the case as it was assumed that X22 > О. Now, the best 
predictor of x; based on a linear function of Хо) or the best predictor in the class of all 
linear functions of X (2) is 


Elx |X] = b'Xo) = X25 Xo (5.6.10) 


Let us consider the correlation between xı and its best linear predictor based on 
Х (оу or the correlation between x, and the linear regression of x; оп Хо). Observe 
that Соу(х], X254 Xo) = XX. Cov(X 2), xı) = XS 1, 221 = d: 
Consider the variance of the best linear predictor: Var(b'X(2)) = b'Cov(X(y)b = 
212 E 3» bre 2251 = 355 X21. Thus, the square of the correlation between x, and 


its best linear predictor or the linear regression on X»), denoted by pL ЫХ” is the fol- 
lowing: 
2 _ (Coven, P Xo)P _ Z2 Ул, (5.6.11) 
Pri b! Xo EB" Var(b' X = = 01.02...р): SS 
ar(x1) Var(b' Хо) 011 


Hence, the following result: 


Theorem 5.6.2. The multiple correlation pi.(2...») between х and x2, ..., xp is also the 
correlation between x, and its best linear predictor or x, and its linear regression on 
X95, $5 Xp- 


Observe that normality has not been assumed for obtaining all of the above properties. 
Thus, the results hold for any population for which moments of order two exist. However, 
in the case of a nonsingular normal population, that is, X; ~ Np(u, X), X > О, it 
follows from equation (3.3.5), that for r = 1, E[x1| X] = X125] Хо when E[X(5] = 
IQ) = О and E(x1) = ш = 0; otherwise, E[x1] X] = ш + X122, (XQ) — uo). 


5.6.4. Distributional aspects of the sample multiple correlation coefficient 


From (5.6.7), we have 


l-r? и ТЕ 81285, S21 20811 — 81285) S21 Е IS] 
1.(2...p) Sa 511 8221811 ' 


(5.6.12) 
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which can be established from the expansion of the determinant |5] = |S22| |Si1 — 
$1255) Sail, which is available from Sect. 1.3. In our case 51] is 1 x 1 and hence we 
denote it as sy; and then |51 — $1255) $51| is $11 — 5125551 S21 which is 1 x 1. Let 
и=1— Fio. m ib. We can compute arbitrary moments of u by integrating out 
over the density of S, namely the Wishart density with m — n — 1 degrees of freedom 
when the population is Gaussian, where n is the sample size. That is, for arbitrary h, 


1 m D = 

Еш! = =a / y || Fema "Sas, (i) 
22 |X'|21,(5) 25>0 

Note that u^ = |S|"|S55|-^s;". Among the three factors |S|", |S22|7” and s7, 152217" 

and m are creating problems. We will replace these by equivalent integrals so that the 

problematic part be shifted to the exponent. Consider the identities 


1 оо 
ca = TO5 | x"-le^5u*dy. x > 0, з > 0, (А) > 0 (ii) 
1 


po--l 
Ib ^ = — р Dol Fe tem XDax;, (ii) 
Tp, (A) X2>O 
for X2 > О, Sn > О, (h) > pr-l where X2 > О is a p2 x p» real positive definite 
matrix, po = p — 1, р = 1, р + p2 = p. Then, excluding —4, the exponent in (i) 
becomes the following: 


tr(Z 18) + 2з1ух + 2tr(S22X2) = t[S(X7! + 2Z)], Z = E x I (iv) 
2 


Noting that (Z7! + 22) = X-!(I + 2X Z), we are now in a position to integrate out 5 


from (i) by using a real matrix-variate gamma integral, denoting the constant part in (i) 
as c]: 


1 e9 PaT 
E[u"] = azr] wf Ха] 2 
D'(h) Dy, (h) x=0 X520 


T [ Кл л Taio 
S>O 


— 


+1 


oo 
a TP гот | / xl yah | x-! 4 2z|- G9 dx л ах» 
х=0 J X5-0 


c12P 6 +) 


m ES poti 
= г, (n2 +p | / хо] 
PAT ph) ? x=0 sd 


x |I +25Zz G*Pdx^dXs. (5.6.13) 
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The integral in (5.6.13) can be evaluated for a general 27, which will produce the non- 
null density of 1 — "Tas ру non-null in the sense that the population multiple correlation 
01.Q...p) # О. However, if p1.2...p) = 0, which we call the null case, the determinant part 
in (5.6.13) splits into two factors, one depending only on x and the other only involving 
X». So letting Ho: 1... p) = 0, 

ci2PG th) 


E hg. е N > 24h ү f h-lnp 4.9 —(3+h)q 
[u"| H9] TOYS p Qn /2 + h)| E | А х [1 + 2011x] x 


+1 ñ 
J X- 2 IE 4 225X; @+®ах„. y) 
X2>O 


But the x-integral gives ГООГО) Og)? for (л) > 0 and the X»-integral gives 


T (Eh) 
Dp; 0) p, C7) -h A p?-1 м: 
T(E +h) |2X'2|~" for (Л) > = —. Substituting all these in (v), we note that all the 
P2\2 
factors containing 2 and X, 011, 2722 cancel out, and then by using the fact that 


m m —1 
Г +) oa PRP +h) 
PB +h Ty 1G +h) rE +) 
we have the following expression for the h-th null moment of и: 
rey rË- +h =i 
вин = — 2 uoo ye сем) 
rž – 255) D + А) 2 2 
which happens to be the h-th moment of a real scalar type-1 beta random variable with the 
parameters (5 — P po), Since й is arbitrary, this h-th moment uniquely determines 


the distribution, thus the following result: 


Theorem 5.6.3. When the population has a p-variate Gaussian distribution with the pa- 
rameters џи and X > О, and the population multiple correlation coefficient (2... 5) = 9, 
the sample multiple correlation coefficient rj.(2...») is such that u = 1 — Ti ..p) is dis- 


tributed as a real scalar type-1 beta random variable with the parameters (5 — E ВЕ) 


2 
и __ 10950) 
= =? 
" T1 Q.. p) 


able with the parameters (5 — | pot) and w = => = 153 
~"1.(2...p) 


a real scalar type-2 beta random variable with the parameters (? =. 
density is 


and thereby v = 4 is distributed as a real scalar type-2 beta random vari- 


is distributed as 


m _ pol 
5 ; ) whose 


I m p— m 
e = wr аЗ ш) 0), 0 < ш < о, (5.6.15) 
rg -Dre 


and zero elsewhere. 


(ш) = 
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As F-tables are available, we may conveniently express the above real scalar үре. 
beta density in terms of an F-density. It suffices to make the substitution ш = ыт 
where F is areal F random variable having p — 1 and m — p + 1 degrees of freedom, that 
is, an Fp—1,m-p+1 random variable, with m = n — 1, n being the sample size. The density 
of this F random variable, denoted by fr(F), is the following: 


[pou —1 pP- —1 _m 
he a ee) 
Dos e me ш 


(5.6.16) 
whenever 0 < F < осо, and zero elsewhere. In the above simplification, observe that 


= -D/ - = -E XD Then, for taking a decision with respect to testing the hypothesis 
zc 
2 
=ptl Гү.02...р : 
Ho Оору = 0, first compute Fp—1,m-p+1 = SE w, ш = —$-2_ Then, reject 


1—rj o. p) 
Н, if the observed Fp—1,m-p+1 = Fp—1,m—p+i,a for a given o. This will be a test at 


significance level o or, equivalently, a test whose critical region's size is œ. The non-null 
distribution for evaluating the power of this likelihood ratio test can be determined by 
evaluating the integral in (5.6.13) and identifying the distribution through the uniqueness 
property of arbitrary moments. 


Note 5.6.1. By making use of Theorem 5.6.3 as a starting point and exploiting various 
results connecting real scalar type-1 beta, type-2 beta, F and gamma variables, one can 
obtain numerous results on the distributional aspects of certain functions involving the 
sample multiple correlation coefficient. 


5.6.5. The partial correlation coefficient 


Partial correlation is a concept associated with the correlation between residuals in 
two variables after removing the effects of linear regression on a set of other variables. 
Consider the real vector X’ = (x1, x2, x3, ..., xp) = (х1, х2, Хз), X4 = (x3, ..., Xp) 
where x1, ..., хр are all real scalar variables. Let the covariance matrix of X be X > О 
and let it be partitioned as follows: 


Х| ої 012 213 
Х = X2 s У = 021 022 3753 И Хез) being (p = 2) x 1, 
X3) Хз 232 233 


where 011, 012, 021, 022 are 1х 1, X13 and 23 are 1 x (p—2), 231 = Xiz» 232 = MA 
and 233 is (p — 2) x (p — 2). Let E[X] = О without any loss of generality. Consider 
the problem of predicting x, by using a linear function of Х (з). Then, the regression of x1 
on Хз) is E[x1| X5] = 213233 Xo) from (5.6.10), and the residual part, after removing 
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this regression from x; is еј = x1 — X43 Ead (з). Similarly, the linear regression of x2 
on X is E[x2|X(3)] = 23 Уу XQ and the residual in x» after removing the effect of 
X (3) is e2 = x2 — X13 De (з). What are then the variances of e; and e», the covariance 
between e, and e», and the scale-free covariance, namely the correlation between e, and 
e2? Since e, and е» are all linear functions of the variables involved, we can utilize the 
expressions for variances of linear functions and covariance between linear functions, a 
basic discussion of such results being given in Mathai and Haubold (2017b). Thus, 


Var(ei) = Var(x1) + Var(Zi3 234! X3) — 2Cov(x1, Zi3 244 X 9) 
= 011 + 213.233 Cov(X (зу) Xz! X31 — 2Cov(x1, Xa Xa) 
= оп + 51355 23i — 2 ij Уз = оуу — X Yy 31. (i) 


It can be similarly shown that 


Var(e2) = 022 — 39у X32 (ii) 
Cov(ei, е2) = 012 — Egi. 232. (iii) 
Then, the correlation between the residuals e; and e», which is called the partial correlation 


between xı and x» after removing the effects of linear regression оп Хз) and is denoted 
by (12.(3...p), i$ such that 


[on — Zi3 Z5! Zl? 


а =_= (5.6.17) 
lou — XX 231 |[o22 — Ez X 97] 


2 
012.(3...р) = 


In the above simplifications, we have for instance used the fact that Xj Уз = 


323 xc 23 since both are real 1 x 1 and one is the transpose of the other. 
The corresponding sample partial correlation coefficient between x; and x» after re- 
moving the effects of linear regression on Хз), denoted by r1? (з... р), is such that: 


[512 — 51355] S32]? 


ж - (5.6.18) 
[511 — 513533 S31][522 — 52333 S32] 


2 = 
Т12.(З...р) =“ 


where the sample sum of products matrix S is partitioned correspondingly, that is, 


$11 512. $13 
S= | 521 S22 Sz |, S33 being (р — 2) x (p—2), (5.6.19) 
S31 S32 $33 
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and 511, 512, 521, 522 being 1 x 1. In all the above derivations, we did not use any assump- 
tion of an underlying Gaussian population. The results hold for any general population 
as long as product moments up to second order exist. However, if we assume a p-variate 
nonsingular Gaussian population, then we can obtain some interesting results on the dis- 
tributional aspects of the sample partial correlation, as was done in the case of the sample 
multiple correlation. Such results will not be herein considered. 


Exercises 5.6 


5.6.1. Let the p x p real positive definite matrix W be distributed as W ^ W,(m, X) 
Wi Wi 
Wa Wz 
Evaluate explicitly the normalizing constant in the density of W by first integrating out 
(1): Wii, (2): Wn, (3): Wiz. 


with X = I. Consider the partitioning W = | | where Wi, isr xr, r < p. 


5.6.2. Repeat Exercise 5.6.1 for the complex case. 


5.6.3. Let the p x p real positive definite matrix W have a real Wishart density with 
degrees of freedom m > p and parameter matrix X > О. Consider the transformation 
W — TT' where T is lower triangular with positive diagonal elements. Evaluate the den- 
sities of the 7;;’s and the 1;;'5, i > jif (1): X = diag(oy1,..., Opp), (2): X > O isa 
general matrix. 


5.6.4. Repeat Exercise 5.6.3 for the complex case. In the complex case, the diagonal 
elements in T are real and positive. 


5.6.5. Let S ~ W,(m, X), X > О. Compute the density of S- in the real case, and 
repeat for the complex case. 


5.7. Distributions of Products and Ratios of Matrix-variate Random Variables 


In the real scalar case, one can easily interpret products and ratios of real scalar vari- 
ables, whether these are random or mathematical variables. However, when it comes to 
matrices, products and ratios are to be carefully defined. Let X; and X» be independently 
distributed p x p real symmetric and positive definite matrix-variate random variables 
with density functions f1(X1) and f2(X2), respectively. By definition, fı and f» are re- 
spectively real-valued scalar functions of the matrices X, and X». Due to statistical inde- 
pendence of X, and X», their joint density, denoted by f (X1, X2), is the product of the 
marginal densities, that is, f (X1, X2) = f1(X1) fo(X2). Let us define a ratio and a product 

1 1 1 1 


of matrices. Let U2 = Xix 1X2 and U; = X3X DU be called the symmetric product 
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1 
and symmetric ratio of the matrices X; and X2, where x3 denotes the positive definite 
square root of the positive definite matrix X2. Let us consider the product U2 first. We 
could have also defined a product by interchanging X; and X2. When it comes to ratios, 
we could have considered the ratios X4 to Хә as well as X2 to X4. Nonetheless, we will 
start with U; and U2 as defined above. 


5.7.1. The density of a product of real matrices 


l l 
Consider the transformation U2 = X5X 1X2, V = X». Then, it follows from Theo- 
rem 1.6.5 that: 


dX; AdX2 = |V| PF dU; ^ dV. (5.7.1) 


Letting the joint density of U2 and V be denoted by g(U2, V) and the marginal density of 
U2, by g2(U2), we have 


fi (OG) fo(X2) ах A dX» = |V| e “AV 10у 2) (V) dU; ^ dV 


go(U2) = f, IVI E f 2U V7?) AV) dV, (672) 


g2(U2) being referred to as the density of the symmetric product U» of the matrices X; and 
X». For example, letting X; апа X» be independently distributed two-parameter matrix- 
variate gamma random variables with the densities 


| Bil! 


Г, (а га) е-е) 7 = 1,2, (i) 
Qj 


PiX) = 
for B; > O, X; > О, Haj) > E j = 1, 2, and zero elsewhere, we have 


g2(U2) = с\й ^ J |V [ez7e - ^1! ea VB V 30s v73) ay. (5.7.3) 
V>O 


where c is the product of the normalizing constants of the densities specified in (i). On 
comparing (5.7.3) with the Krátzel integral defined in the real scalar case in Chap. 2, 
as well as in Mathai (2012) and Mathai and Haubold (1988, 201 1a, 2017,a), it is seen 
that (5.7.3) can be regarded as a real matrix-variate analogue of Kratzel’s integral. One 
could also obtain the real matrix-variate version of the inverse Gaussian density from the 
integrand. 
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As another example, let Ё (ХІ) be a real matrix-variate type-1 beta density as pre- 


viously defined in this chapter, whose parameters are (y + BE a) with R(a) > 
pe 1 


, 9i(y) > —1, its density being given by 


T (y + 2 +a) 
Dy(y + *) Pp (о) 


fX) = ix — xi F (ii) 


for О < X, < I, (y) > –1, (a) > "e and zero elsewhere. Letting f2(X2) = 
f (X2) be any other density, the density of (72 is then 


+1 1 1 
0) = | IVE fi(V~2U2V~2) fo(V)dV 
I» + PT i deg p p 
201 4 [АА Р и 20у – ру T f(V) dV 
Г, (у + 25-)Г} (о) 
Гру + 2 + о) |U” 


= VI-IV 16-2 г(у)ау 
Г + 297) D d | | 2| f(V) 


Е Г,(у + 2H + а) 


= ————————— / (5.7.4) 
Py (y + pt) 2,U2,y 
where 
T E Wal" И |77 [и — U;*- f(V)dV, (о) > ——, (5.7.5) 
2, Us y Гр (о) Jv2u,-0 С 


is called the real matrix-variate Erdélyi-Kober right-sided or second kind fractional in- 
tegral of order o and parameter y as for р = 1, that is, in the real scalar case, (5.7.5) 
corresponds to the Erdélyi-Kober fractional integral of the second kind of order o and pa- 
rameter y. This connection of the density of a symmetric product of matrices to a fractional 
integral of the second kind was established by Mathai (2009, 2010) and further papers. 


5.7.2. M-convolution and fractional integral of the second kind 


Mathai (1997) referred to the structure in (5.7.2) as the M-convolution of a product 
where fı and f? need not be statistical densities. Actually, they could be any function pro- 
vided the integral exists. However, if fı and f» are statistical densities, this M-convolution 
of a product can be interpreted as the density of a symmetric product. Thus, a physical 
interpretation to an M-convolution of a product is provided in terms of statistical den- 
sities. We have seen that (5.7.2) is connected to a fractional integral when f| is a real 
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matrix-variate type-1 beta density and № is an arbitrary density. From this observation, 
one can introduce a general definition for a fractional integral of the second kind in the 
real matrix-variate case. Let 


о ася p-1 " 
fi(X1) = $100) qn (о) > EE (iii) 


and f2(X2) = 62(X2) f (X2) where фі and ф» are specified functions and f is an arbitrary 
function. Then, consider the M-convolution of a product, again denoted by g2(U2): 


i ' 1 |= 05у |23 
82002) = | IV| Z $(V 2U2V 23)————— 
V Г.(о) 


х PQV) f (V)dV,, (о) > (5.7.6) 


The right-hand side (5.7.6) will be called a fractional integral of the second kind of order o 
in the real matrix-variate case. By letting р = 1 and specifying фі and фә, one can obtain 
all the fractional integrals of the second kind of order o that have previously been defined 
by various authors. Hence, for a general p, one has the corresponding real matrix-variate 
cases. For example, on letting $1(X1) = |X1|" and @2(X2) = 1, one has Erdélyi-Kober 
fractional integral of the second kind of (5.7.5) in the real matrix-variate case as for p — 1, 
itis the Erdélyi-Kober fractional integral of the second kind of order o. Letting $1(X1) = 1 
and ф2(Хэ) = |X2|*, (5.7.6) simplifies to the following integral, again denoted by g2(U2): 


82000) = IV — Ug[- "= f(V)àv. (5.7.7) 


I, (@) Jv2uo 


For p = 1, (5.7.7) is Weyl fractional integral of the second kind of order a. Accord- 
ingly, (5.7.7) is Weyl fractional integral of the second kind in the real matrix-variate 
case. For p = 1, (5.7.7) is also the Riemann-Liouville fractional integral of the second 
kind of order o in the real scalar case, if there exists a finite upper bound for V. If V is 
bounded above by a real positive definite constant matrix B > O in the integral in (5.7.7), 
then (5.7.7) is Riemann-Liouville fractional integral of the second kind of order o for the 
real matrix-variate case. Connections to other fractional integrals of the second kind can 
be established by referring to Mathai and Haubold (2017). 

The appeal of fractional integrals of the second kind resides in the fact that they can 
be given physical interpretations as the density of a symmetric product when fı and f2 
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are densities or as an M-convolution of products, whether in the scalar variable case or the 
matrix-variate case, and that in both the real and complex domains. 


5.7.3. A pathway extension of fractional integrals 


Consider the following modification to the general definition of a fractional integral of 
the second kind of order o in the real matrix-variate case given in (5.7.6). Let 


ЛО) = Ф101) NC moleste xir, 
where 3i(o) > E L and q < landa > 0, 7 > О аге real scalar constants. For all q < 
1, go(U2) corresponding to А (Ху) and f2(X25) of (iv) will define a family of fractional 
integrals of me second kind. Observe that when X, and J — a(1 — q4)Xı > О, then 
O < Xi < x ics ELE However, by writing (1 — 4) = —(q — 1) for q > 1, one can switch 
into a type-2 beta form, namely, / + a(q — 1)Xı > О for q > 1, which implies that 
X, > О and the fractional nature is lost. As well, when д —> І, 


f2(X2) = b2(X2) f(X2), (і) 


IZ + аа — DXi| Жї > eae) 


which is the exponential form or gamma density form. In this case too, the fractional nature 
is lost. Thus, through q, one can obtain matrix-variate type-1 and type-2 beta families and 
a gamma family of functions from (iv). Then q is called the pathway parameter which 
generates three families of functions. However, the fractional nature of the integrals is lost 
for the cases q > land q — 1. m the real scalar case, x; may have an exponent and 
making use of [1 — (1 — qi ]*-! can lead to interesting fractional integrals for q < 1. 
However, raising X; to an exponent à in the matrix-variate case will fail to produce results 
of interest as Jacobians will then take inconvenient forms that cannot be expressed in terms 
of the original matrices; this is for example explained in detail in Mathai (1997) for the 
case of a squared real symmetric matrix. 


5.7.4. The density of a ratio of real matrices 


1 1 
One can define a symmetric ratio in four different ways: X5 X го with V = X» or 
l l 
V = X; and XIX XT with V = X2 or V = Xi. АП these four forms will produce 
different structures on f1(X1) f2(X2). Since the form U; that was specified in Sect. 5.7 
1 1 


in terms of XiX Du with V — X» provides connections to fractional integrals of the 
first kind, we will consider this one whose density, denoted by gı(U1), is the following 


observing that dX) л dX» = |V| 7 |U;|- ?* dU, ^ dV: 


81001) = / IVI 7 qui 9*9 fiV tU v3) (У) (5.7.8) 
V 
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provided the integral exists. As in the fractional integral of the second kind in real matrix- 
variate case, we can give a general definition for a fractional integral of the first kind in 
the real matrix-variate case as follows: Let (Ху) and f2(X2) be taken as in the case of 
fractional integral of the second kind with фу and $» as preassigned functions. Then 


pl їй 1 
acp = f IVI Z JUT P+D p (VUT! V?) 
V 


d ы] күү aa р—1 
х || — V2U, V2" 7 dV) f (V)dV, Ræ) > EC (5.7.9) 


Tp (a) 


As an example, letting 


Ty +e) |, pH si 
$103) = BE x," Taes Ny) > P=, 
Г,(у) 2 
we have 
Г»(у +a) 0177 _р+1 
putes КШ IVY IU: — vie" у(у)ау 
I») I(a) V«U, 
— К 5.7.10 
o es 
for (o) > 25, (у) > 251, where 
a Uj| "Y phi 
Kip Vf EM IVI” iu, — V| E f(v)dv (5.7.11) 


Г, р (а) V «Ui; 

for (0) > Ky) > mE is Erdélyi-Kober fractional integral of the first kind of 
order o and parameter y in the real matrix-variate case. Since for p — 1 or in the real 
scalar case, Kj ny f is Erdélyi-Kober fractional integral of order o and parameter y, the 
first author referred to Кү Ui 3 f in (5.7.11) as Erdélyi-Kober fractional integral of the first 
kind of order o in the real matrix-variate case. 

By specializing фі and $» in the real scalar case, that is, for p = 1, one can obtain 
all the fractional integrals of the first kind of order o that have been previously introduced 
in the literature by various authors. One can similarly derive the corresponding results on 
fractional integrals of the first kind in the real matrix-variate case. Before concluding this 
section, we will consider one more special case. Let 


1(X1) = X117 27 F and ф›(Х») = |Х›|©. 
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In this case, gı (U1) is not a statistical density but it is the M-convolution of a ratio. Under 
the above substitutions, g1(U1) of (5.7.9) becomes 


g— PH p-1 
gi(U1) = |U, — VI ? f(V)dV, Raw) > BA (5.7.12) 


I5(a) Jv zu, 


For p = 1, (5.7.12) is Weyl fractional integral of the first kind of order о; accordingly 
the first author refers to (5.7.12) as Weyl fractional integral of the first kind of order o in 
the real matrix-variate case. Since we are considering only real positive definite matrices 
here, there is a natural lower bound for the integral or the integral is over О < V < Uj. 
When there is a specific lower bound, such as О < V, then for p = 1, (5.7.12) is called 
the Riemann-Liouville fractional integral of the first kind of order o. Hence (5.7.12) will 
be referred to as the Riemann-Liouville fractional integral of the first kind of order o in 
the real matrix-variate case. 


Example 5.7.1. Let X, and X2 be independently distributed р x p real positive definite 
gamma matrix-variate random variables whose densities are 


1 . pti К р- 1 
(Xj) = ХД 2 ет), X; > 0, Raj) > ——, ј = 1,2, 
f;i(Xy) T, jl j> аы 


and zero elsewhere. Show that the densities of the symmetric ratios of matrices U} = 
—1 —1 1 1 
X, ^X5X, ^ and U2 = ХХХ are identical. 


Solution 5.7.1. Observe that for p = 1 that is, in the real scalar case, both {Л and 
U> are the ratio of real scalar variables = but in the matrix-variate case U; and U2 are 
different matrices. Hence, we cannot expect the densities of U; and U2 to be the same. 
They will happen to be identical because of a property called functional symmetry of 


the gamma densities. Consider U; and let V = X1. Then, X2 = ушу? and dX; A 
p+1 
dX, = IV| rdv ^ аб. Due to the statistical independence of X, and X», their joint 


density is fı (X1) (X2) and the joint density of U and V is |V| 7 fi(V) (V3U1V3), 
the marginal density of U;, denoted by gı (Л), being the following: 


р+1 
[EI x pH 1 1 
aU = [| qypetm- Ee VV ay, 


Ty) (a1) Г, (a2) Jvso 


The exponent of e can be written as follows: 


(У) —tr(V 2U, V3) = —tr(V)-tr(VU)) 2—tr(V( -U))) = —tr[ +) 2V (ГЕ U4). 
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Letting Y = (I + U3V( + U1)? > dY = |I + Uj| 7 dV. Then carrying out the 
integration in 21 (U4), we obtain the following density function: 


Tp (a + 02) — Pt —(æ1 +02) ; 
giU) = i a ree (i) 
Pp (a1) Pp (оэ) 
which is a real matrix-variate type-2 beta density with the parameters (оо, о). The original 
conditions 3i(v;) > І zi j = 1,2, remain the same, no additional conditions being 


needed. Now, consider U5 and let V = X» so that X, = ViU;lvi => dX; A dX2 = 
|V| 7 |U2|-@+ Pav л dU». The marginal density of U» is then: 


—oj P1 py (pt 
ИШ» t's |0) Ct? jme оуу ТУА] 
Гр(а1) Г, (a2) V>0 
P P > 


82(U2) = (i) 


As previously explained, the exponent in (ii) can be simplified to —tr[(/ + U, y UW + 
шз, which once integrated out yields Г, (оу + œ2)| Z + U. |-@1+¢2) Then, 


|+” рр | PF 7 + pr ere) - ГАС + | (ten (iii) 


It follows from (ї),(її) and (iii) that g1(U1) = g2(U2). Thus, the densities of U; and U2 are 
indeed one and the same, as had to be proved. 


5.7.5. A pathway extension of first kind integrals, real matrix-variate case 


As in the case of fractional integral of the second kind, we can also construct a pathway 
extension of the first kind integrals in the real matrix-variate case. Let 


X — 1 
_ Pil Du -al -qX Ë, a= 1, Ra) > = (5.7.13) 


X 
Ri (OG) T (o) lea 


and f2(X2) = 2(X2) f (X2) for the scalar parameters a > 0, n > 0, q < 1. When 
q < 1, (5.7.13) remains in the generalized type-1 beta family of functions. However, 
when q > 1, fi switches to the generalized type-2 beta family of functions and when 
q — 1, (5.6.13) goes into a gamma family of functions. Since Ху > О for q > 1 and 
q — 1, the fractional nature is lost in those instances. Hence, only the case q < 1 is 
relevant in this subsection. 


For various values of g < 1, one has a family of fractional integrals of the first kind com- 
ing from (5.7.13). For details on the concept of pathway, the reader may refer to Mathai 
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(2005) and later papers. With the function f1(X1) as specified in (5.7.13) and the cor- 
responding f2(X2) = $2(X2)f (X2), one can write down the M-convolution of a ratio, 
210071), corresponding to (5.7.8). Thus, we have the pathway extended form of g;(U}). 


5.7a. Density of a Product and Integrals of the Second Kind 


The discussion in this section parallels that in the real matrix-variate case. Hence, only 
a summarized treatment will be provided. With respect to the density of a product when 
fi and Ё are matrix-variate gamma densities in the complex domain, the results are paral- 
lel to those obtained in the real matrix-variate case. Hence, we will consider an extension 
of fractional integrals to the complex matrix-variate cases. Matrices in the complex do- 
main will be denoted with a tilde. Let X; and X? be independently distributed Hermitian 
positive definite complex matrix- variate random variables whose densities are fi (X 1) and 


fo( X2), respectively. Let 05 = X 2%, 3 and U; = x dm x2 where x denotes the 
Hermitian positive definite square root of the Hermitian positive definite matrix X». Sta- 


tistical densities are real-valued scalar functions whether the argument matrix is in the real 
or complex domain. 


5.7a.1. Density of a product and fractional integral of the second kind, complex case 


Let us consider the transformation (X, X3) — (05, V) and (X, X3) > (01, V), 
the Jacobians being available from Chap. 1 or Mathai (1997). Then, 


Idet(V)|-PdU5 ^ dV 


E E ` В 5.74.1 
Idet(V)|? |det(U;)| ?" dU, ^ dV. EUM 


dX) лах» = | 


When f| and f» are statistical densities, the density of the product, denoted by go (U5), is 
the following: 


&(U2) = [ Idet(V)|-? f(V 203 V2) (Ӯ) аў (5.7a.2) 
V 


where |det(-)| is the absolute value of the determinant of (-). If f and f» are not statistical 
densities, (5.7a.1) will be called the M-convolution of the product. As in the real matrix- 
variate case, we will give a general definition of a fractional integral of order o of the 
second kind in the complex matrix-variate case. Let 


TET E 1 
Ху) = Ху) 
fi(X1) = ф\( О 


|det(/ — X))|*7?, Ræ) > p — 1, 
p(a) 
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and f2 (Хо) = $2(X2) f (X2) where фі and ф» are specified functions and f is an arbitrary 
function. Then, (5.7a.2) becomes 


2200) = | |det(V)|-? $i (V~2U2V~2) 
V 


x | det — 9-30, 9-3yp-"os (P) FC) a (5.72.3) 
I, (a 


for 9t(a) > p — 1. As an example, let 


Py(y + pto) 


: Idet(X,)|" and $5(X35) = 1. 
Ip(y + p) 


$1(X1) = 


Observe that f; (Х|) has now become a complex matrix-variate type-1 beta density with 
the parameters (y + p, о) so that (5.7a.3) can be expressed as follows: 


" А i5 » а g и is 
Pup ee деў) det — 021? FW) av 
Iy(y + p) Гъ(о/) ў>б»>о 
ERU CUR ER (5.7a.4) 
Гу + p 2,U2,y 
where 
m E "E TER" 
Ке = ———— det(V)| ^""|det(V — U;)|^ P? f(V) аи (5.7a.5 
Star! = Rey Joao SOC Te? -OPENA Gas 


is Erdélyi-Kober fractional integral of the second kind of order o in the complex matrix- 
variate case, which is defined for (0) > p — 1, 9i(y) > —1. The extension of fractional 
integrals to complex matrix-variate cases was introduced in Mathai (2013). As a second 
example, let 

Ф101) = 1 and $o(X2) = |det(V)*. 


In that case, (5.7a.3) becomes 
#005) = J |det(V — U3)|*7? f (V) dV, Ræ) > p — 1. (5.7a.6) 
V>U2>0 


The integral (5.7a.6) is Weyl fractional integral of the second kind of order o in the com- 
plex matrix-variate case. If V is bounded above by a Hermitian positive definite constant 
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matrix B > O, then (5.7a.6) is a Riemann-Liouville fractional integral of the second kind 
of order o in the complex matrix-variate case. 

A pathway extension parallel to that developed in the real matrix-variate case can be 
similarly obtained. Accordingly, the details of the derivation are omitted. 


5.7a.2. Density of a ratio and fractional integrals of the first kind, complex case 
We will now derive the density of the symmetric ratio 01 defined in Sect. 5.7а. If fi 
and f» are statistical densities, then the density of U;, denoted by 21 (17), is given by 


gib) = [ |det(V)|?| de(U4)| ?? fi (V3U 102) A) dV, (5.72.7) 
V 


provided the integral is convergent. For the general definition, let us take 


1 


Губа) 


AX) = d) ес — X1)|°-?, Ræ) > p — 1, 


and fo (X>) = ф› (Хә) f (X4) where фу and фә are specified functions and f is an arbitrary 
function. Then 21 (07/1) is the following: 


1 


adr) = | аас deny” = 100207192) 
ү 


Гу (а) 
х |det(I — V207'V2)°-P bo (V) f(V) аў. (5.7a.8) 
As an example, let 
Е Г (y +a) ~ _ 
Хр) 2 — — —“|det(X) |" 
1 (X1) FO) |det(X1)| 


and фә = 1. Then, 


|det(V)|” 
«V «Ui, 


х |де — V) f(V) dV, Ræ) > p—1 


: Г det(Ü,)|-*-Y 
TE ply +) | et(U1)] J 
I 2, 


y) Г, (о) 


= oN KR F (5.7.9) 
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where 


det(U1)|-*-Y Е A T "m 
Ке f= | | det(V)|” |det(U; — V)|° Pf(V)dV (5.7a.10) 
i I(a) V «Ü, 
for (0) > p — 1, is the Erdélyi-Kober fractional integral of the first kind of order o and 
parameter y in the complex matrix-variate case. We now consider a second example. On 
letting $1 (X1) = |det(X1)| "^? and $2(X2) = |det(X2)|^, the density of Uj is 


#100) = ldet(U; — V) P f(V)dV, Ræ) > p — 1. (5.7a.11) 


a) V «Ü, 


The integral in (5.7a.11) is Weyl’s fractional Integra of the first kind of order o in the 
complex matrix-variate case, denoted by W 7 "TE Observe that we are considering only 


Hermitian positive definite matrices. Thus, ires is a lower bound, the integral being over 
O < V < Uj. Hence (5.7a.11) can also represent a Riemann-Liouville fractional integral 
of the first kind of order o in the complex matrix-variate case with a null matrix as its 
lower bound. For fractional integrals involving several matrices and fractional differential 
operators for functions of matrix argument, refer to Mathai (2014a, 2015); for pathway 
extensions, see Mathai and Haubold (2008, 2011). 


Exercises 5.7 


All the matrices appearing herein are p x p real positive definite, when real, and Her- 
mitian positive definite, when in the complex domain. The M-transform of a real-valued 
scalar function f (X) of the p x p real matrix X, with the M-transform parameter p, is 
defined as 


gc : pc 
Mry(p) = IX" ? f(X)dX, RCo) > жш. 
X>O 


whenever the integral is convergent. In the real case, the M-convolution of a product U2 = 


x? ХІХ» н with the corresponding functions f1(X1) апа 5 (Хә), respectively, is 


e) - [iv ^T fa(V~2U2V~2) fa(V) dV 


whenever the integral is convergent. The M-convolution of a ratio in the real case is 
g1(U1). The M-convolution of a product and a ratio in the complex case are 22005) апа 
g (Č 1), respectively, as defined earlier in this section. If œ is the order of a fractional in- 
tegral operator operating on f, denoted by А7“ f, then the semigroup property is that 
ASAP f = A (D f = A-P A7? f. 
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5.7.1. Show that the M-transform of the M-convolution of a product is the product of the 
M-transforms of the individual functions fı and f2, both in the real and complex cases. 


5.7.2. What are the M-transforms of the M-convolution of a ratio in the real and complex 
cases? Establish your assertions. 


5.7.3. Show that the semigroup property holds for Weyl’s fractional integral of the (1): 
first kind, (2): second kind, in the real matrix-variate case. 


5.7.4. Do (1) and (2) of Exercise 5.7.3 hold in the complex matrix-variate case? Prove 
your assertion. 


5.7.5. Evaluate the M-transforms of the Erdélyi-Kober fractional integral of order o of 
(1): the first kind, (2): the second kind and state the conditions for their existence. 


5.7.6. Repeat Exercise 5.7.5 for (1) Weyl’s fractional integral of order a, (2) the Riemann- 
Liouville fractional integral of order o. 


5.7.7. Evaluate the Weyl fractional integral of order o of (a): the first kind, (b): the second 
kind, in the real matrix-variate case, if possible, if the arbitrary function is (1): e 00, (2): 
e UO and write down the conditions wherever it is evaluated. 


5.7.8. Repeat Exercise 5.7.7 for the complex matrix-variate case. 


5.7.9. Evaluate the Erdélyi-Kober fractional integral of order о and parameter y of the 
(a): first kind, (b): second kind, in the real matrix-variate case, if the arbitrary function is 
(1): |X|°, (2): | X|-?, wherever possible, and write down the necessary conditions. 


5.7.10. Repeat Exercise 5.7.9 for the complex case. In the complex case, | X| — determi- 
nant of X, is to be replaced by | det(X )|, the absolute value of the determinant of X. 


5.8. Densities Involving Several Matrix-variate Random Variables, Real Case 


We will start with real scalar variables. The most popular multivariate distribution, 
apart from the normal distribution, is the Dirichlet distribution, which is a generalization 
of the type-1 and type-2 beta distributions. 


5.8.1. The type-1 Dirichlet density, real scalar case 


Let x1, ..., xy be real scalar random variables having a joint density of the form 
fix хр) mex lea lox 5 caue (5.8.1) 


foro = ((х1,...,х0)10 < xj < 1, fHl,...,k, OS x +--+ +x < 1}, оу) > 
0, j =1,...,4 + l and / = 0 elsewhere. This is type-1 Dirichlet density where c; is 
the normalizing constant. Note that c describes a simplex and hence the support of fı is 
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the simplex w. Evaluation of the normalizing constant can be achieved in differing ways. 
One method relies on the direct integration of the variables, one at a time. For example, 


integration over x; involves two factors ces and (1— x1 -—-:-— xj), Let Д be the 
integral over х1. Observe that x; varies from 0 to 1 — x2 — --- — xz. Then 

1—x5—--—xk 1 

= | xD = xy me = xp) Idx. 
x1=0 

But 

| А х1 “к+1—1 
(П1—ху—--<— хх)! = (1-р) [ — ru . 

]—x2—-.-.— Xk 

Make the substitution y = = = = dx, = (1 — x2 —--- — xi)dy, which enable one 
to integrate out y by making use of a real scalar type-1 beta integral giving 


Г (a) (o1) 
D'(o1 + ок) 
for (01) > 0, (0) > 0. Now, proceed similarly by integrating out x2 from 
Е — x2 — +++ — xQ)*1 t". and continue in this manner until x; is reached. Fi- 
nally, after canceling out all common gamma factors, one has Г (о) -- - Г (окт) / Г (о + 
ct og) for (0) > 0, j =1,...,k +1. Thus, the normalizing constant is given by 


1 
| yt Td — у)%+н—1ду = 
0 


T T (œi +: + 9+1) 

Г (01) -++ T (aes) ' 
Another method for evaluating the normalizing constant c consists of making the follow- 
ing transformation: 


R@;)>0, j=1,...,k+1. (5.8.2) 


ХІ = Yi 
x2 = yo(1 — yı) 
xj = yj = yD — yp--- (1— yj, j =2,...,K (5.8.3) 


It is then easily seen that 
dx A... ліх = (1 — y! у) ?...(1— урл) dy A... лду. (5.84) 
Under this transformation, one has 
1—ху=1—у 
1—ху—х2 = (1—у)(1— y2) 
l= xi —-+-— x = (1—у)(1— yn: (0 у). 
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Then, we have 


—1 =| = 
xz rd (l3 xe ldx] A... A ху 
1—1 0—1 a+- +01 
= yy Peryg (1 — yı) 2 pra 


x (1 _ yo) at tee -l ee (1 _ yp) ld yy PS dyz. 


Now, all the y;'s are separated and each one can be integrated out by making use of a 
type-1 beta integral. For example, the integrals over y1, y2,..., ук give the following: 


[ y"-lg = yt tee dy, - Г (о) Г (a2 +--+ Ok+1) 
| Pa +: + ока) 


[ yea = yo) este teal dy? E Г (ол) Г (оз Tec O41) 
о"? Г(о2 + +++ + Qaa) 


A yet 11 — yp) dy, = йд, 
o^ Г (ок + окул) 


for R(aj) > 0, j =1,...,k + 1. Taking the product produces ae 


5.8.2. The type-2 Dirichlet density, real scalar case 


Let xı > 0,..., xx > О be real scalar random variables having the joint density 
fot. o Xk) = б! e ur ^d Jag hose pap ee (5.8.5) 


forx; > 0, j=1,...,k, (0) > 0, j=1,...,k +1 and f? = 0 elsewhere, where c; 
is the normalizing constant. This density is known as a type-2 Dirichlet density. It can be 
shown that the normalizing constant c; is the same as the one obtained in (5.8.2) for the 
type-1 Dirichlet distribution. This can be established by integrating out the variables one 
at a time, starting with x, or х. This constant can also be evaluated with the help of the 
following transformation: 


X1 = у 
x2 = у2(1+ yı) 
xj = yl yp yn) A + y J 2-25: hy (5.8.6) 


whose Jacobian is given by 


dxi ^... ліх = (1 + y 1 + у) 2. (1 + yg D дул... Adyx. (5.87) 
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5.8.3. Some properties of Dirichlet densities in the real scalar case 


Let us determine the h-th moment of (1 — x; —---— хр) in a type-1 Dirichlet density: 
ЕП — x1 goes] = fo = X] a= xp)" xy s ss Хх) dx A... A ах. 
@ 


In comparison with the total integral, the only change is that the parameter a; +1 is replaced 
by «ок +A; thus the result is available from the normalizing constant. That is, 


E[1— x ego саа T («у +- + og) 
| : ræk) Uap tess Бор +h) 


(5.8.8) 


The additional condition needed 15 (оҳу + Л) > 0. Considering the structure of the 
moment in (5.8.8), u = 1 — x1 —--- — xy is manifestly a real scalar type-1 beta variable 
with the parameters (от, о +---+ ок). This is stated in the following result: 


Theorem 5.8.1. Let x1, ..., хк have a real scalar type-1 Dirichlet density with the pa- 
rameters (01, ..., ок; к). Then, и = 1 — xı —--- — xy has a real scalar type-I beta 
distribution with the parameters (+41, 01 +: + og), andl — u = x +- -- xc has a 
real scalar type-1 beta distribution with the parameters (ot + +--+ og, o4 1). 


Some parallel results can also be obtained for type-2 Dirichlet variables. Consider 
a real scalar type-2 Dirichlet density with the parameters (o, ..., œk; окъ). Let v = 
(1 4- x1 xà). Then, when taking the h-th moment of v, that is E[v"], we see that 
the only change is that оҳу becomes o1 + h. Accordingly, v has a real scalar type-1 
beta distribution with the parameters (ор, o + +: + ор). Thus, 1 — v = DA 
is a type-1 beta random variables with the parameters interchanged. Hence the following 
result: 


Theorem 5.8.2. Let x1, ..., xy have a real scalar type-2 Dirichlet density with the pa- 
rameters (0, ..., Ок; M41). Then v = (1+х1 +--+- +x)! has a real scalar type-1 beta 
distribution with the parameters (окул, 03 d4---- +ок) and 1 — v = то has a real 


scalar type-1 beta distribution with the parameters (a, +--+ + og, ак). 


Observe that the joint product moments E [x e T can be determined both in the 
cases of real scalar type-1 Dirichlet and type-2 Dirichlet densities. This can be achieved by 
considering the corresponding normalizing constants. Since an arbitrary product moment 
will uniquely determine the corresponding distribution, one can show that all subsets of 
variables from the set {x1, ..., хк} are again real scalar type-1 Dirichlet and real scalar 
type-2 Dirichlet distributed, respectively; to identify the marginal joint density of a subset 


382 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


under consideration, it suffices to set the complementary set of h ;’s equal to zero. Type-1 
and type-2 Dirichlet densities enjoy many properties, some of which are mentioned in the 
exercises. As well, there exist several types of generalizations of the type-1 and type-2 
Dirichlet models. The first author and his coworkers have developed several such models, 
one of which was introduced in connection with certain reliability problems. 


5.8.4. Some generalizations of the Dirichlet models 


Let the real scalar variables x1, ..., xy have a joint density of the following type, which 
is a generalization of the type-1 Dirichlet density: 


giu... Xk) = ba gast -xi х2)... 


x xc — xy 5:5 хр) ан] (5.8.9) 


for (x1,..., xy) € о, Raj) > 0, j = 1,...,k + 1, as well as other necessary con- 
ditions to be stated later, апа g} = 0 elsewhere, where bg denotes the normalizing con- 
stant. This normalizing constant can be evaluated by integrating out the variables one 
at a time or by making the transformation (5.8.3) and taking into account its associ- 
ated Jacobian as specified in (5.8.4). Under the transformation (5.8.3), y1, ..., ук will 
be independently distributed with y; having a type-1 beta density with the parameters 
(aj, Уу), Vj = оры T c: + Os + Bj +: + Be. j = 1,...,k, which yields the 


normalizing constant 
k 


I'(oj + yj) 
bk = | | 5.8.10 
і П Par) med 


for (е) 2 0, j=1,...,k +1, (yp > 0, J = 1,...,k, where 
yg = Qj + +s + окы + Pj Bg T= 1» cena 7 (5.8.11) 


Arbitrary moments E cee e E id are available from the normalizing constant bg by re- 
placing o; by aj +h; for j = 1,..., k and then taking the ratio. It can be observed from 
this arbitrary moment that all subsets of the type (x1, ..., xj) have a density of the type 
specified in (5.8.9). For other types of subsets, one has initially to rearrange the variables 
and the corresponding parameters by bringing them to the first j positions and then utilize 
the previous result on subsets. 

The following model corresponding to (5.8.9) for the type-2 Dirichlet model was in- 
troduced by the first author: 


810, x) =акхү! +) Pas? + хр x2) Pese 
x ж + Х1 +... + хр) Qut tee) Pk (5.8.12) 
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for x; > 0, j = 1,...,k, 3i(aj) > 0, j = 1,...,k + 1, as well as other necessary 
conditions to be stated later, and g2 = 0 elsewhere. In order to evaluate the normaliz- 
ing constant ag, one can use the transformation (5.8.6) and its associated Jacobian given 
in (5.8.7). Then, y1, ..., yy become independently distributed real scalar type-2 beta vari- 
ables with the parameters (o;, 5;), where 


0; — 0 +: оу гар Bj BL (5.8.13) 


for (0) > 0, j = L...,k-r 1, (8) > 0, j = 1,..., К. Other generalizations are 
available in the literature. 


5.8.5. A pseudo Dirichlet model 


In the type-1 Dirichlet model, the support is the previously described simplex w. We 
will now consider a model, which was recently introduced by the first author, wherein the 
variables can vary freely in a hypercube. Let us begin with the case k — 2. Consider the 
model 


812031, хо) = суо x9 (1934)? 1 (1532)? (1-х) (91*927D, 0 < ху « 1, (5.8.14) 


for (©) > 0, j = 1,2, and gi? = 0 elsewhere. In this case, the variables are free to 
vary within the unit square. Let us evaluate the normalizing constant c12. For this purpose, 
let us expand the last factor by making use of the binomial expansion since 0 < x1x»? < 1. 
Then, 


(ay + оо — Dk 
(epp) у Т au (i) 
k=0 


where for example the Pochhmmer symbol (a); stands for 
(a)k = a(a t- 1)--- (a Fk — 1), a EO, (a)o = 1. 
Integral over x; gives 


| Г(К + 1)Г(о\) 
k «1—1 
1— "dx = : 
/ HM = (aj +k+1) 


(ол) > 0, (ii) 


and the integral over x2 yields 


! Г (o +k + 1)P (a2) 
york 1—x o2-lgy S iii 
/ > ( 2) 2 IU ЕС) (iii) 
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Taking the product of the right-hand side expressions in (ii) and (iii) and observing that 
Г(о ta2+k+1) = Г(о +оо + 1)(o + a2 + D and P'(k + 1) = (1)&, we obtain 
the following total integral: 


оо 


Г (о) Г (a2) Y (1) (от + оо — 1) 

I'(o +a + 1) Кат + a2 + 1) 
— I'(a))T (a2) 
~~ F(a, 4- a2 4- 1) 
|. F@)r@) Г(о 4 05 + Dr) 
— rla +a+1) Г(о +оо) Г) 


_ T&T (02) 
= Taire 1 (о) > 0, 91(02) > 0, (5.8.15) 


k=0 


Е, a, +a2—1;a@,; +оо + 1; 1) 


where the » F| hypergeometric function with argument 1 is evaluated with the following 
identity: 
I'(c)I'(c — a — b) 
Fi(a,b;c;l)-— 5.8.16 
3Fi(a, b; c; 1) Г(с—ауГ(с—) ( ) 
whenever the gamma functions are defined. Observe that (5.8.15) is a surprising result as 
it is the total integral coming from a type-1 real beta density with the parameters (a1, œ2). 
Now, consider the general model 


gu xg) = ed — x). xp. 


Е ИА — x1.. ‚ху Ett) 0< xj <1, j Ed 
(5.8.17) 


Proceeding exactly as in the case of k — 2, one obtains the total integral as 


_1 _ Tœ)... F(x) | О 
[cix] = Pip du) (aj) > 0, Ј = ЕТТЕ: < (5.8.18) 


This is the total integral coming from а (k — 1)-variate real type-1 Dirichlet model. Some 
properties of this distribution are pointed out in some of the assigned problems. 


5.8.6. The type-1 Dirichlet model in real matrix-variate case 


Direct generalizations of the real scalar variable Dirichlet models to real as well as 
complex matrix-variate cases are possible. The type-1 model will be considered first. Let 
the p x p real positive definite matrices X1, ..., Xy be such that X; > О, 1 — Xj > О, 
that is X; as well as J — X; are positive definite, for j = 1,...,k, and, in addition, 
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Г Хрон Xk > ONAN = (КОО < Xj <I, j=1,...,k, I—Xı— 
0 — Хк > О}. Consider the model 


р+1 1 
Gi(X1,..., Xi) = Cl X [777 Те Aas 


р 


хр Ху: XF, (Хү...) ER, (5.8.19) 


for 9t(o;) > ae j =1,...,k4+ 1, and С = 0 elsewhere. The normalizing constant 
Ск can be determined by using real matrix-variate type-1 beta integrals to integrate the 
matrices one at the time. We can also evaluate the total integral by means of the following 
transformation: 


Xie 
Хә = (I -YVI — Y)? 
X;--Y)0P---Y;aXYju -Yj-93--(-Y0l j-22,..,k. (5.8.20) 


The associated Jacobian can then be determined by making use of results on matrix trans- 
formations that are provided in Sect. 1.6. Then, 


ахул... лах = E — Yi -DO2...|p у 7 dY, A... лау. (5.8.21) 


It can be seen that the Y ;'s are independently distributed as real matrix-variate type-1 beta 
random variables and the product of the integrals gives the following final result: 


Гь tcc akt) 


p-1 
Г (01) --- Tp (esi) 


С 
| 2 


Rj) > « J Hees a" (5.8.22) 
By integrating out the variables one at a time, we can show that the marginal densities of 
all subsets of {X1,..., Xg} also belong to the same real matrix-variate type-1 Dirichlet 
distribution and single matrices are real matrix-variate type-1 beta distributed. By tak- 
ing the product moment of the determinants, E[| Х| |^: ... Х|", one can anticipate the 
results; however, arbitrary moments of determinants need not uniquely determine the den- 
sities of the corresponding matrices. In the real scalar case, one can uniquely identify 
the density from arbitrary moments, very often under very mild conditions. The result 
I — X, —---— Xy has areal matrix-variate type-1 beta distribution can be seen by taking 
arbitrary moments of the determinant, that is, E[|J — X; —---—X x I^], but evaluating 
the h-moment of a determinant and then identifying it as the h-th moment of the determi- 
nant from a real matrix-variate type-1 beta density is not valid in this case. If one makes 
a transformation of the type Yı = X1,..., Ye-1 = Xk-1, Yk = I — Xi —--- — Xk, 
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it is seen that X, = J — Yı —--- — Y, and that the Jacobian in absolute value is 1. 
Hence, we end up with a real matrix-variate type-1 Dirichlet density of the same format but 
whose parameters œg and оҳу are interchanged. Then, integrating out Yj,..., Yi .1, we 
obtain a real matrix-variate type-1 beta density with the parameters (o1, о +: Бор). 
Hence the result. When Y; has a real matrix-variate type-1 beta distribution, we have that 
I — Үк = X,+---+ Ху is also a type-1 beta random variable with the parameters inter- 
changed. 

The first author and his coworkers have proposed various types of generalizations to 
the matrix-variate type-1 and type-2 Dirichlet models in the real and complex cases. One 
of those extensions which is defined in the real domain, is the following: 


G2(X1,..., Xy) = сх у = хехе F 
р+1 
x|- Ху – Xa <- < |X| F 
x [P= Xi = ХАТА, (5.8.23) 


for (X1,..., Xy) є 2, Raj) > E j =1,...,k +1, and G2 = 0 elsewhere. The 

normalizing constant Cj, can be evaluated by integrating variables one at a time or by 

using the transformation (5.8.20). Under this transformation, the real matrices Y;'s are 

independently distributed as real matrix-variate type-1 beta variables with the parameters 

(0, Vj). Vj =Qj+1 +: Бок + Bj +--+ Bk. The conditions will then be R(a;) > 
1 


PAS j=1,...,k +1, and R(y;) > 2, j = 1,..., k. Hence 


k 
Г(а; + yj) 
С = | | =. 5.8.24 
P T&T, (у) SEM 


5.8.7. The type-2 Dirichlet model in the real matrix-variate case 


j=l 


The type-2 Dirichlet density in the real matrix-variate case is the following: 


p+1 vi 
G3(X1,..., Xi) = C, Xi I7 77 -T х9 27 
x [E+ Xi coop ХИ) X; > О, ј = 1,..., ke, (5.8.25) 


for (0 ;) > pei, j=l,...,k +1 and Сз = 0 elsewhere, the normalizing constant Ск 
j 2 


being the same as that appearing in the type-1 Dirichlet case. This can be verified, either by 
integrating matrices one at a time from (5.8.25) or by making the following transformation: 


X|— Yi 
X) = (I + ҮЗҮ»(1 + Y1)? 
Xj = @+ Y03 Q4 Yj3YjU + Yj-03 Q4 Y03, j 22,....k.. (5.8.26) 
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Under this transformation, the Jacobian is as follows: 
= (к= 190257) = 

аху л... лах; = |I 4 Yj | 2^... (T+ Yg 4| 2? ау A... лау. (5.8.27) 

Thus, the Y;'s are independently distributed real matrix-variate type-2 beta variables 
and the product of the integrals produces [C;]~!. By integrating matrices one at a time, we 
can see that all subsets of matrices belonging to (X4, ..., Хк} will have densities of the 
type specified in (5.8.25). Several properties can also be established for the model (5.8.25); 
some of them are included in the exercises. 
Example 5.8.1. Evaluate the normalizing constant c explicitly if the function f (X1, X2) 


is a statistical density where the p x p real matrices X; > О, I — X; > О, j = 1,2, 
and / — X, — X5 > O where 


p+1 
f(Xi, X2) 2c pig uu = XP X2% T I — X1 — XJT, 
Solution 5.8.1. Note that 
p+1 p+1 p+1 
I- Xi- XPE = | XPE- = ХЮ) 0 - Хр) FI 


Now, letting Y = (I — X1) 2 Xa(4 — X1)? > dY = |I — Xi|- "7 dX», and the integral 
over X» gives the following: 


ху ye- ^? uy pe- ау = -xte ^P РАЗ 900) 
O«Y «I D5(02 + 82) 


for 9 (02) > кч. (82) > E Then, the integral over X, can be evaluated as follows: 


Tp (a1) Lp (Bi + Bo + 02) 
Гр(о + 02 + Bi + B2) 


| Xt – xu ftrt ау, = 
O<X,<I 


for (a) > D (B; + Bo + оо) > a Collecting the results from the integrals over 


X» and Х| and using the fact that the total integral is 1, we have 


- D'5(o2 + В) Г.(о + «з + B1 + £2) 
D5(02) Pp (82) Гь (о |) Ip (002 + Bi + B2) 


for (о) > 257, j = 1,2, (В) > 25", and W(fi + fa +оо) > 751. 
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The first author and his coworkers have established several generalizations to the type- 
2 Dirichlet model in (5.8.25). One such model is the following: 
р+1 ptl 
ба(Х\,..., X) = Cael X" E I + X 7X227 
p+! 
x [E+ Xi + XP | m 
x [E+ Xp +--+ X,| (ert teeatBo (5.8.28) 
for R(aj) > = j=l,...,k+1, Xj > О, j =1,...,k, as well as other necessary 
conditions to be stated later, and G4 = 0 elsewhere. The normalizing constant C2; can be 
evaluated by integrating matrices one at a time or by making the transformation (5.8.26). 
Under this transformation, the Y ;'s are independently distributed real matrix-variate type-2 
beta variables with the parameters (о ;, д ;), where 


ô = 0 +: + Oi + By e+e Вр (5.8.29) 
The normalizing constant is then 
k 
Г,(0; + $; 
Gk = I] Грба + 8) (5.8.30) 
jel I, (aj) Lp (6;) 


where the 5; is given in (5.8.29). The marginal densities of the subsets, if taken in the 
order X4, (X1, X2}, and so on, will belong to the same family of densities as that specified 
by (5.8.28). Several properties of the model (5.8.28) are available in the literature. 


5.8.8. A pseudo Dirichlet model 


We will now discuss the generalization of the model introduced in Sect. 5.8.5. Consider 
the density 
LBTI 1 
Gik(Xi,..., Xx) = Cik — Xif 2 ++ | — ХА 2 
ОРД ВЫ 
1 1 i 1 p+! 

Сеа" (5.8.31) 
Then, by following steps parallel to those used in the real scalar variable case, опе сап 
show that the normalizing constant is given by 
Tose ap) Lp 1) 
Гал) ++ Tylon) [D E20] 
The binomial expansion of the last factor determinant in (5.8.31) is somewhat complicated 
as it involves zonal polynomials; this expansion is given in Mathai (1997). Compared to 


Сік = 


(5.8.32) 


the real scalar case, the only change is the appearance of the constant TED. which 
pU 
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is 1 when р = 1. Apart from this constant, the rest is the normalizing constant in a real 
matrix-variate type-1 Dirichlet model in k — 1 variables instead of k variables. 


5.8a. Dirichlet Models in the Complex Domain 


All the matrices appearing in the remainder of this chapter are p x p Hermitian positive 
definite, that is, X j= x where an asterisk indicates the conjugate transpose. Complex 
matrix-variate random variables will be denoted with a tilde. For a complex matrix X, 
the determinant will be denoted by det(X ) and the absolute value of the determinant, by 
|det(X)|. For example, if det(X) = а + ib, a and b being real and i = 4/10), the 
absolute value 1s |det(X | = Ha? + p), The type-1 Dirichlet model in the complex 
domain, denoted by С, is the following: 


Gi(X1,..., ХӘ = Сиех). |det(X4)]**7? 
x |det — Ху — +--+ — Ху). (5.8a.1) 


for (X1,..., Xy) e 0, Q = ((Х,..., ХО < Xj «I, ј = 1,...,Е, O < + 
ТТЕ < П, #(0;) > p-1, j=l,...,k+1, and бу = 0 elsewhere. The normalizing 
constant Су can be evaluated by integrating out matrices one at a time with the help of 
complex matrix-variate type-1 beta integrals. One can also employ a transformation of the 
type given in (5.8.20) where the real matrices are replaced by matrices in the complex 
domain and Hermitian positive definite square roots are used. The Jacobian is then as 
follows: 


dX, л... лаху = |де — Yi)| D? ...|det(1 — у 1)|?аў ^... лаў. (5.8а2) 


Then Y j аге independently distributed as complex matrix-variate type-1 beta variables. 
On taking the product of the total integrals, one can verify that 


~ Г, (ay +++ + op) 
C, = 


= = (5.8a.3) 
Ty (01) +++ Г (окъ) 
where for example Г p(&) is the complex matrix-variate gamma given by 
Pi) 2 n ^ r(a)P (« — 1)--- (a — p-- D, R@) > p — 1. (5.8a.4) 


The first author and his coworkers have also discussed various types of generalizations to 
Dirichlet models in complex domain. 
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5.8a.1. A type-2 Dirichlet model in the complex domain 


One can have a model parallel to the type-2 Dirichlet model in the real matrix-variate 
case. Consider the model 


б» = Cy|det(X1)|"'~? - - - |det(X,) |? 
x |det(I + X +--+ + Xj)| (179 (5.8a.5) 


for X; > О, j = 1,..., k, (oj) > p—1, j = L..., k-F 1, and бә = 0 elsewhere. By 
integrating out matrices one at a time with the help of complex matrix-variate type-2 inte- 
grals or by using a transformation parallel to that provided in (5.7.26) and then integrating 
out the independently distributed complex type-2 beta variables Y j $, we can show that 
the normalizing constant C; is the same as that obtained in the complex type-1 Dirichlet 
case. The first author and his coworkers have given various types of generalizations to the 


complex type-2 Dirichlet density as well. 


Exercises 5.8 


5.8.1. By integrating out variables one at a time derive the normalizing constant in the 
real scalar type-2 Dirichlet case. 


5.8.2. By using the transformation (5.8.3), derive the normalizing constant in the real 
scalar type-1 Dirichlet case. 


5.8.3. By using the transformation in (5.8.6), derive the normalizing constant in the real 
scalar type-2 Dirichlet case. 


5.8.4. Derive the normalizing constants for the extended Dirichlet models in (5.8.9) 
and (5.8.12). 


5.8.5. Evaluate E [x e xit] for the model specified in (5.8.12) and state the conditions 
for its existence. 


5.8.6. Derive the normalizing constant given in (5.8.18). 


5.8.7. With respect to the pseudo Dirichlet model in (5.8.17), show that the product u = 
X1 -++ Xk is uniformly distributed. 

5.8.8. Derive the marginal distribution of (1): х1; (2): (x1, x2); (3): (x1, ..., x), r < k, 
and the conditional distribution of (x1, ..., х) given (x;+1,..., xy) in the pseudo Dirich- 
let model in (5.8.17). 


5.8.9. Derive the normalizing constant in (5.8.22) by completing the steps in (5.8.22) and 
then by integrating out matrices one by one. 
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5.8.10. From the outline given after equation (5.8.22), derive the density of I — X,—---— 
Хк and therefrom the density of X; + ---+ Xy when (X1, ..., Xx) has a type-1 Dirichlet 
distribution. 


5.8.11. Complete the derivation of C1, in (5.8.24) and verify it by integrating out matrices 
one at a time from the density given in (5.8.23). 


5.8.12. Show that = (I + X1 +---+ Хр)! in the type-2 Dirichlet model in (5.8.25) 
is a real matrix-variate type-1 beta distributed. As well, specify its parameters. 


5.8.13. Evaluate the normalizing constant C; in (5.8.25) by using the transformation pro- 
vided in (5.8.26) as well as by integrating out matrices one at a time. 
5.8.14. Derive the ô; in (5.8.29) and thus the normalizing constant C2, in (5.8.28). 
5.8.15. For the following model in the complex domain, evaluate C: 
f(X) = C|det(X |" Раец — X) |detQC)|?7?| det — X1 — 2) +- 
x |det(/ — X, — ... — Xp PA 


5.8.16. Evaluate the normalizing constant in the pseudo Dirichlet model in (5.8.31). 
1 


1 

5.8.17. In the pseudo Dirichlet model specified in (5.8.31), show that U = Хр... X5 
l l 

X1X5 --- Хр is uniformly distributed. 


5.8.18. Show that the normalizing constant in the complex type-2 Dirichlet model speci- 
fied in (5.8a.5) is the same as the one in the type-1 Dirichlet case. Establish the result by 
integrating out matrices one by one. 


5.8.19. Show that the normalizing constant in the type-2 Dirichlet case in (5.8a.5) is the 
same as that in the type-1 case. Establish this by using a transformation parallel to (5.8.26) 
in the complex domain. 


5.8.20. Construct a generalized model for the type-2 Dirichlet case for k = 3 parallel to 
the case in (5.8.28) in the complex domain. 
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Chapter 6 A) 
Hypothesis Testing and Null Distributions м5 


6.1. Introduction 


It is assumed that the readers are familiar with the concept of testing statistical hy- 
potheses on the parameters of a real scalar normal density or independent real scalar nor- 
mal densities. Those who are not or require a refresher may consult the textbook: Mathai 
and Haubold (2017) on basic “Probability and Statistics” [De Gruyter, Germany, 2017, free 
download]. Initially, we will only employ the likelihood ratio criterion for testing hypothe- 
ses on the parameters of one or more real multivariate Gaussian (or normal) distributions. 
All of our tests will be based on a simple random sample of size n from a p-variate nonsin- 
gular Gaussian distribution, that is, the p x 1 vectors X1, ..., X, constituting the sample 
are iid (independently and identically distributed) as X; ~ N,(u, 2), X > О, j= 
1,...,, when a single real Gaussian population is involved. The corresponding test cri- 
terion for the complex Gaussian case will also be mentioned in each section. 


In this chapter, we will utilize the following notations. Lower-case letters such as x, y 
will be used to denote real scalar mathematical or random variables. No distinction will be 
made between mathematical and random variables. Capital letters such as X, Y will denote 
real vector/matrix-variate variables, whether mathematical or random. A tilde placed on 
a letter as for instance x, y, X and Y will indicate that the variables are in the complex 
domain. No tilde will be used for constant matrices unless the point is to be stressed that 
the matrix concerned is in the complex domain. The other notations will be identical to 
those utilized in the previous chapters. 


First, we consider certain problems related to testing hypotheses on the parameters of 
a p-variate real Gaussian population. Only the likelihood ratio criterion, also referred to as 
A-criterion, will be utilized. Let L denote the joint density of the sample values in a simple 
random sample of size n, namely, X1,..., Xn, which are iid Np(u, X), X > О. Then, 
as was previously established, 
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b=] аа Mas L (6.1.1) 


п eTa E Xj) eTit S-A- E (K—p) 
нг таи mB 
x4 — Qm Ол)? |X|? 


where 5 = 2G; = X)(X; — X) is the sample sum of products matrix and X = 
i(x 1+---+X,,) is the sample average, n being the sample size. As well, we have already 


determined that the maximum likelihood estimators (MLE's) of и and X are й = X and 
X= 15 , ће sample covariance matrix. Consider the parameter space 


Q = {(u, D) 7 > О, W = Git cas Hp) —оо<иу<осо, j=1,..., p} 


The maximum value of L within Q is obtained by substituting the MLE’s of the parameters 
into L, and since (X — д) = (X — X) = О and (2715) = tr(n]5) = np, 


RE PEP: 
e 2 е 2n2 
5, Qr)? |gs (2m) 2 |S]? 


Under any given hypothesis on jz or 27, the parameter space is reduced to a subspace o in 
€ or o C Q. For example, if Ho : и = Uo where uo is a given vector, then the parameter 
space under this null hypothesis reduces to o = ((u, Z))u = uo, У > O} C Q, 
"null hypothesis" being a technical term used to refer to the hypothesis being tested. The 
alternative hypothesis against which the null hypothesis is tested, is usually denoted by Hj. 
If u = Ho specifies Ho, then a natural alternative is Hı : u Æ Ho. One of two things can 
happen when considering the maximum of the likelihood function under H,. The overall 
maximum may occur in о or it may be attained outside of о but inside ©. If the null 
hypothesis Н, is actually true, then о and © will coincide and the maxima in о and in © 
will agree. If there are several local maxima, then the overall maximum or supremum is 
taken. The A-criterion is defined as follows: 


sup, L 


E Оа (6.1.3) 


supgL' 
If the null hypothesis is true, then А = 1. Accordingly, an observed value of A that is close 
to 0 in a testing situation indicates that the null hypothesis Ho is incorrect and should then 
be rejected. Hence, the test criterion under the likelihood ratio test is to “reject H, for 
0 < X < А”, that is, for small values of 4, so that, under Ho, the coverage probability 
over this interval is equal to the significance level o or the probability of rejecting Ho 
when Н, is true, that is, Pr(0 < А < A5| Ho} = o for a pre-assigned o, which is also 
known as the size of the critical region or the size of the type-1 error. However, rejecting 
Н, when it is not actually true or when the alternative Hj is true is a correct decision 
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whose probability is known as the power of the test and written as 1 — 6 where @ is the 
probability of committing a type-2 error or the error of not rejecting H, when H, is not 
true. Thus we have 


Pr{0 < à < А | Ho} = а and Pr{0 <A < ào | Hi) = 1 — f. (6.1.4) 


When we preassign a = 0.05, we are allowing a tolerance of 5% for the probability of 
committing the error of rejecting H, when it is actually true and we say that we have a test 
at the 5% significance level. Usually, we set a as 0.05 or 0.01. Alternatively, we can allow 
a to vary and calculate what is known as the p-value when carrying out a test. Such is the 
principle underlying the likelihood ratio test, the resulting test criterion being referred to 
as the A-criterion. 


In the complex case, a tilde will be placed above A and L, (6.1.3) and (6.1.4) remaining 
essentially the same: 


~ Sup,L а 
А = =, 0 < |А < 1, (6.1а.1) 
supo 
and _ _ 
Pr(0 < |М < ào | Ho} =a, Pr{0 < |М < Ao] Hi] —1— В (6.1a.2) 


where о is the size or significance level of the test and 1 — £, the power of the test. 
6.2. Testing H, : u = шо (Given) When X is Known, the Real №, (ш, X) Case 


When X is known, the only parameter to estimate is и, its MLE being X. Hence, the 
maximum in Q2 is the following: 
a 2009 9) 
(2л) 2 |X|2 


In this case, u is also specified under the null hypothesis Ho, so that there is no parameter 
to estimate. Accordingly, 


e 227i Gn E! QCG—uo) 


sup, L 


(2л) xj 
e 3007-507 po) E! OC - po) 
= — . (6.2.2) 
(21) 2 |52 
Thus, 
› = Wol | L4 E Gu) (6.2.3) 


supo L 
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and small values of A correspond to large values of 5X — uo) X-I(X — uo). When 
Xj ^ Ny(u, X), X > O,ithas already been established that X ~ Np(H, 15), У > О. 
As well, n(X — Ho) XT! p шо) is the exponent in a p-variate real normal density under 
H,, which has already been shown to have a real chisquare distribution with p degrees of 
freedom or 

n(X — uo) E X — po) ~ Хр. 


Hence, the test criterion 1s 
Reject Ho if n(X — uo) E (X — uo) = хр, with Pr(x? > x? 4) 2a. (6.2.4) 


Under the alternative hypothesis, the distribution of the test statistic is a noncen- 


— d 


tral chisquare with p degrees of freedom and non-centrality parameter А = 5(и — 
Bo) X! (p — шо). 


Example 6.2.1. For example, suppose that we have a sample of size 5 from a population 
that has a trivariate normal distribution and let the significance level o be 0.05. Let uo, the 
hypothesized mean value vector specified by the null hypothesis, the known covariance 


matrix 27, and the five observation vectors X1,..., Xs be the following: 
І 200 4 о 0 
ш= | 0|, 5 = |0 1 1|5х5-- {0 2 1 |, 
—1 0 1 2 0 —1 1 
1 2 0 2 4 
Х|=|0|,Х»=([—1|,Хз=|—1|,Хл={|4|,Х»<= 2 |, 
1 4 —2 1 —1 


the inverse of X having been evaluated via elementary transformations. The sample aver- 
age, 104 + --- + X5) denoted by X, is 


z- aed 2 0 2 4 TE 
Х=-—+4|0|+|—-—1|+|-1|+|4|+| 2/7 —z|4]: 
5 ||| 4 _2 1 ST 5 |3 
апа 
9 1 4 
А 1 1 
X—bo= r 4| — 0 =r 4 
3 =j 8 
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For testing H,, the following test statistic has to be evaluated: 


1 
1 0 4 
S = 5 2 40 
n — fo) E — ш) = 5 [4 4 8]|0 2 —1||4 Sra, 
0 -1 8 


As per our criterion, H, should be rejected if 8 > ere Since Xo = xs 0.05 = 7.81, this 
critical value being available from a chisquare table, Н, : и = Ho should be rejected at the 
specified significance level. Moreover, in this case, the p-value is Pr( xi > 8} 2 0.035, 
which can be evaluated by interpolation from the percentiles provided in a chi-square table 
or by making use of statistical packages such as R. 


6.2.1. Paired variables and linear functions 


Let ү, ..., Ү be p x 1 vectors having their own p-variate distributions which are 
not known. However, suppose that a certain linear function X = ау + --- + ар is 
known to have a p-variate real Gaussian distribution with mean value vector E[X] = 
ш and covariance matrix Cov(X) = X, X > О, that is, X = а +--+ аїр ~ 
Npy(u, X), X > О, where aj,..., ак are fixed known scalar constants. An example 
of this type is X = Y; — Y? where Y, consists of measurements on p attributes before 
subjecting those attributes to a certain process, such as administering a drug to a patient, 
and Y? consists of the measurements on the same attributes after the process is completed. 
We would like to examine the difference Y; — Y2 to study the effect of the process on 
these characteristics. If it is reasonable to assume that this difference X = Y, — Yo is 
Npy(u, X), X > О, then we could test hypotheses on E[X] = u. When 27 is known, 
the general problem reduces to that discussed in Sect. 6.2. Assuming that we have iid 
variables on Y4, ..., Yg, we would evaluate the corresponding values of X, which produces 
па variables on X, that is, a simple random sample of size n from X = a, Y, +---+axYx. 
Thus, when X is known, letting u — n(X — ме) — Ho) ^v Xp where X denote the 
sample average, the test would be carried out as follows at significance level a: 


Reject Ho : и = Ho (specified) when и > x. q With Prix; > ES а= 9, (6.2.5) 
the non-null distribution of the test statistic и being а non-central chisquare. 


Example 6.2.2. Three variables х = systolic pressure, x? = diastolic pressure and 
X3 — weight are monitored after administering a drug for the reduction of all these p — 3 
variables. Suppose that a sample of n — 5 randomly selected individuals are given the 
medication for one week. The following five pairs of observations on each of the three 
variables were obtained before and after the administration of the medication: 
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150, 140 180, 150 160, 160 140, 138 130, 128 
90,90 |, | 95,90 |,| 85,80 |, | 85,90 |,| 85,85 
70, 68 75,70 70, 65 70, 71 75,74 


Let X denote the difference, that is, X is equal to the reading before the medication was 
administered minus the reading after the medication could take effect. The observation 
vectors on X are then 


150 — 140 10 30 0 2 2 
Хү= | 90—90 |= |0 |, Х= |5 |, X3=]/5], X4=]—-5], х5 = |0 
70 — 68 2 5 5 —1 1 

In this case, Х1,..., X5 are observations on iid variables. We are going to assume that 


these iid variables are coming from a population whose distribution is № (и, 27), X > О, 
where 27 is known. Let the sample average X = iX 1o X5), the hypothesized mean 
value vector specified by the null hypothesis Н, : и = Ho, and the known covariance 
matrix » be as follows: 


_ [4 8 200 ij 0 
хес 5 |, = |01, 5= |0 1 1|55--| 0 2 -1 
12 2 012 0 —1 1 


Let us evaluate X — uo and n(X — Ho) X! (X — uo) which are needed for testing the 
hypothesis Ho : и = Ho: 


_ 1 [44 8] [4 
Х-ш= = | 5 |– |0|== |5 
ЕЕЕ: 
1 0 0][4 
= ТЕЖЕ 5 2 
n(X — р) X (X — n9) = 5 [4 5 2]}0 2 -1/||5|284. 
0-1 1112 


Let us test H, at the significance level о = 0.05. The critical value which can readily be 
found in a chisquare table is Xp. = x5 0.05 = 7.81. As per our criterion, we reject Ho if 
8.4 > Xs a; Since 8.4 > 7.81, we reject Ho. The p-value in this case is Pr > 8.4} = 
Pr(x2 > 8.4} = 0.04. 


6.2.2. Independent Gaussian populations 


Let Y; ^ N,(u(j, Xj), X; > О, j = 1,...,k, and let these k populations 
be independently distributed. Assume that a simple random sample of size n; from Y; 
is available for j = 1,...,k; then these samples can be represented by the p-vectors 
Yjg, q = l,...,nj, which are iid as У,у, for j = 1,...,k. Consider a given linear 
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function X = ajY; +--+ + акк where X is p x 1 and the Y;'s are taken in a given 
order. Let U = ayy Tee + акк where Y; = ач Yjq for j = 1,...,k. Then 
E[U] = aiu) ++: t ag = u (say), where а, ..., ак are given real scalar constants. 


The covariance matrix in U is Cov(U) = a s +-+ “yy = iy (say), where n is 
a symbol. Consider the problem of testing hypotheses on u when X is known or when 
aj, Xj, j = 1,..., К, are known. Let Ho : ш = шо (specified), in the sense шеу) is a 
known vector for j = 1,..., k, when X is known. Then, under Ho, all the parameters are 
known and the standardized U is observable, the test statistic being 


k k 
Уату (ў = no x; = Ij) e ЖОО (6.2.6) 
j=l j=l 
where Ta ) j =1,...,k, denote independent chisquares random variables, each having 


p degrees of freedom. However, since this is a linear function of independent chisquare 
variables, even the null distribution is complicated. Thus, only the case of two independent 
populations will be examined. 

Consider the problem of testing the hypothesis ш — u2 = б (a given vector) when 
there are two independent normal populations sharing a common covariance matrix 2 
(known). Then U is U — Y; — № with E[U] = Ші = u2 = ô (given) under Н, and 
Cov(U) — G + DY = "лә У, the test statistic, denoted by v, being 


nin» /v-1 nin» — — xcd — 2 
v= (U—6) >> (0—8) = (у= 1—5) X (Yj—Y5—68) ~ x;. (6.2.7) 
ni n2 пі + n2 Р 
The resulting test criterion is 
Reject H, if the observed value of v > x o With Prix? > xx) =, (6.2.8) 


Example 6.2.3. Let Y; ~ № (и), 27) and Yo ~ № (ио), 27) represent independently 
distributed normal populations having a known common covariance matrix 27. The null 
hypothesis is Hy : ұт) — о) = ô where ô is specified. Denote the observation vectors on 


Y; and Y? by үу, j = l,...,ni and Y2;, j =1,..., n2, respectively, and let the sample 
sizes be n, = 4 and n2 = 5. Let those observation vectors be 
2 5 7 8 
Yi = 1 ‚ Үр = 5 , Үз = 8 , Y]4 = 10 | and 
5 3 7 12 
2 4 7 6 1 
Yoo = |1 |, Yoo = |3 |, Yoo=] 10), Y4=|5]|, Ys= | 1], 
3 2 8 6 2 
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and the common covariance matrix 27 be 


200 1 0 
y=1011/5>57'=/|0 2 -1 
012 0-1 1 


Let the hypothesized vector under Н, : шор — моу = б be 5’ = (1, 1, 2). In order to test 
this null hypothesis, the following quantities must be evaluated: 


2. | l 
Yi = Unopsesr алзы ош м. 
1 


- 1 1 
№ = Vn +e Юа) = sU Toc Үр»), 
2 


v 7 nin /ул—1 
U-Y|-Y, v= DU TOE (U — 8). 
They аге 
МОЕ РЕ 8 E 
Yi — 1 11+ 151+ 181+ 110 = 2 24 |, 
5 3 7 12 27 
— [P] [^4 7 6 І 20 
Y = 1/+])3/+]10}+ ]5]/+4+]1 —-|[20|, 
3 2 8 6 2 21 
_ 22 1 20 1.50 
U = Yı — Ү = = | 24 == 20 | = | 2.00 
27 21 2.55 
Then, 
1.50 0.50 
U —5 =| 2.00} — = | 1.00 
2.55 0.55 
yh) 
пі +Nn2 
1 
5 0 0 0.50 
4)(5 2 
= oo [0.50 1.00 0.55] 0 2 —1 1.00 
0 —1 1 0.55 


20 
= 1.3275 х a 2.95. 
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Let us test Ho at the significance level a = 0.05. The critical value which is available 
from a chisquare table is Xp. € Xs. 0.05 = 7:81. As per our criterion, we reject Ho if 
2.95 > yo o; however, since 2.95 < 7.81, we cannot reject Ho. The p-value in this case 
is Pr{x; > 2.95} = Prix? > 2.95} ~ 0.096, which сап be determined by interpolation. 


6.2a. Testing Ho : и = uo (given) When 27 is Known, Complex Gaussian Case 


The derivation of the A-criterion in the complex domain is parallel to that provided for 
the real case. In the parameter space, 


_ ау 
І = ——_ 6.2a.1 
MPa” = лро)" ee 
and under Н, : и = Ho, a given vector, 


» ent E7!S)—n(X— z- Cue) 
арава (6.24.2) 
л" |аег(7)|” 
Accordingly, 


5 _ Wo СИЕ e nnt 7M Xue) (6.2a.3) 


Here as well, small values of А correspond to large values of у = п(Х — Uo)“ E (X — Ho); 
which has a real gamma distribution with the parameters (y = p, В = 1) ora chisquare 
distribution with p degrees of freedom in the complex domain as described earlier so that 
2y has a real chisquare distribution having 2p degrees of freedom. Thus, a real chisquare 
table can be utilized for testing the null hypothesis H,, the criterion being 


Reject Ho if 2n(X — po)" E^! (X — uo) > х2, a», With Pr{x2, > х2, a} = e. (6.24.4) 


The test criteria as well as the decisions are parallel to those obtained for the real case in 
the situations of paired values and in the case of independent populations. Accordingly, 
such test criteria and associated decisions will not be further discussed. 


Example 6.2a.1. Let p — 2 and the 2 x 1 complex vector Х ~ №(й, 3 5 = Ў* > 
О, with 27 assumed to be known. Consider the null hypothesis Ho : д = Ao where [Lo is 
specified. Let the known 27 and the specified д, be the following where i = /(—1): 


"Wc UE Wag CUN ЖО UMS 
ues me EP = | De eres 
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Let the general jz and general X be represented as follows for p = 2: 
„а + > рх ѓу 
= " , X = . 
д [ + 3 Ё + 
so that, for the given У i 


det(Z) = (2)(3) – (1 +Ð — i) = 6 (12 + 1?) = 4 = det(X*) = |det(X)|. 


The exponent of the general density for p = 2, excluding —1, is the form (Š — 
[)* X- (X — A). Further, 


[X — i*Z X – jor = (X – j0*X- (X — д) 


since both X and X-! are Hermitian. Thus, the exponent, which is 1 x 1, is real and 
negative definite. The explicit form, excluding —1, for p — 2 and the given covariance 
matrix 27, is the following: 


Q= зг = ш)? + (у = v] + — 2)? + 2 — 00)?] 
+ 2101 — ш) (х2 — мә) + Q1 — vi)» — v), 
and the general density for р = 2 and this 2 is of the following form: 
fib = doe? 
4л? 


where ће О is as previously given. Let the following be an observed sample of size п = 4 
from a № (й, X) population whose associated covariance matrix 2 is as previously 
specified: 


Then, 
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ОТ s) 

Е 1 | Lp. ~(14+i)] [1 —5i 

= Q5 [1 + 5i 2-9] = 2 Moses 

= ыза + 5i)(1 — 5i) + 222 — 91)(24+ 9i) – (1+ 1)(1 + 5i) 2. + 9i) 
— (1 — i)\(2 — 9i )\(1 — 5i)} 

= ЧЕ х 26 +2 х 85 +2 x 62} = 46.5. 


Let us test the stated null hypothesis at the significance level a = 0.05. Since one 
xi. 0.05 = 9.49 and 46.5 > 9.49, we reject Ho. In this case, the p-value is Pr{x3, 
46.5} = Pr(x2 > 46.5} ~ 0. 


IV 


6.2.3. Test involving a subvector of a mean value vector when X is known 


Let the p x 1 vector X; ~ Np(u, X), X > О, j = 1,...,n, and ће X;’s be 
independently distributed. Let the joint density of X;, j = 1,...,n, be denoted by L. 
Then, as was previously established, 


n eT ETX) eTit !5-5-uy x (K—w) | 
L= po] = про оп (i) 
Qn )?|2'|2 2r)? |X]? 


j=l 


where X = 1(Xı + +++ + Xn) and, letting X = (X1, ..., Xn) of dimension p x n and 
X=(X,...,X), S= (X — X)(X — Xy. Let X, X-! and и be partitioned as follows: 


E XO y! у12 (1) 
X - [xo , 21 2|: к= os 
X De M u 
where Х ©) and (P arer x 1, r < p, and X!! isr x r. Consider the hypothesis u® = 
mu, (specified) with X known. Thus, this hypothesis concerns only a subvector of the 
mean value vector, the population covariance matrix being assumed known. In the entire 


parameter space ©, и is estimated by X where X is the maximum likelihood estimator 
(MLE) of u. The maximum of the likelihood function in the entire parameter space is then 


e 200275) 
тах L = — 5; —;. (ii) 
9 (2л) 2 ||? 
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Let us now determine the MLE of и, which is the only unknown quantity under the null 
hypothesis. To this end, we consider the following expansion: 


_ Em _ _ y! yT 0 
(X — uy ETX – u) = (XK) – wy, (Х® – 409y | xo. о 


g2! y2 
= (KO - py zi xo — ug + (KO — pOYER KO — pO) 
+2(Х® — y x (x (0 — uP). (iii) 


Noting that there are only two terms involving u® in (iii), we have 
д E " 
2—5 12=0%0- 2z?(x0 — wp) — 252100 — wp) = 0 
u 
= AO = XO (DAIO — "noy 


Then, substituting this MLE йо) in ће various terms in (iii), we have the following: 


(XO — gOy x2 (x0 — A) 2 (XO — Фу x2( 522-1 21 (XC) _ "nu 
2(XO — py Mo — "nur = 0000 u(Dy x2 522-1 2M (XC) E nu zs 
Sap еше ap Vier oa Ege eq 
= (XO = uy xy (KO - up), 
since, as established in Sect. 1.3, x = xl! — y2(x?2)-1»?1 Thus, the maximum of 
L under the null hypothesis is given by 
-4u(718)- 5 (XO — uy xi GO - ug) 


max L = AE ; 
Ho (2л) 2 |х| 


and the A-criterion is then 


5 1 =1;5 ї 
‚ = Bis оО утыр?) (6.2.1) 
maxg L 


Hence, we reject H, for small values of A or for large values of n( X (D — uP y ry XO – 
и?) ~ x since the expected value and covariance matrix of X} are respectively и? 


апа X11/n. Accordingly, the criterion can be enunciated as follows: 
Reject Н, : и = и (given) if u = n(X? — uy DI (XO — uP) > x2, (6.2.2) 


with Pr{ x > x- a} = a. In the complex Gaussian case, the corresponding 24 will 
be distributed as a real chisquare random variable having 2r degrees of freedom; thus, the 
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criterion will consist of rejecting the corresponding null hypothesis whenever the observed 
value of 2u > х2. a 


Example 6.2.4. Let the 4 x 1 vector X have a real normal distribution N4(u, 27), X > 
О. Consider the hypothesis that part of jz is specified. For example, let the hypothesis Ho 
and 27 be the following: 


1 2100 xi 
DEN ESI _|1201|_ у, КО _[X® 
Ho : U = Mo = " UE 00231 —X-0,X- Ка = [žo 
m 0112 X4 


Since we are specifying the first two parameters in и, the hypothesis can be tested by 


computing the distribution of xX) = || Observe that Х ©) ~ Na(u®, X11), Xu 
2 


O where 
a_a] oa [1 ‚о OD _ |2 1 ep dra ci 
H ‚| Mo m Ej A: ш = Hue , ХУ = l 2 > X 3 —1 2 Ы 


Let the observed vectors from the original № (и, X) population be 


1 —1 0 2 

0 1 2 1 —1 
Хү = 2 , X2: = 1 , Хз = 3 И X4 = —1 , X5 = 0 

4 2 4 3 4 


Then the observations corresponding to the subvector Х (0. denoted by x, are the fol- 


lowing: 
а) 1 (1) -1 a) _ |0 п) _ |2 (1) 2 
x} = = I Н = POLINME 


In this case, the sample size n = 5 and the sample mean, denoted by XP, is 


ШЕ ОКЕ spl" 


XO — 


y (1 1 
X0 yO = 


о 


1 
5 
1 
5 
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Therefore 


E Е 5 1f И eS 
(1) (1) —lryQO) (D _ 


1 
= — (146) = 9.73. 
15 (146) 


If 9.73 > X5 ui then we would reject HS” : u® = и. Let us test this hypothesis at 
the significance level a = 0.01. Since X; 0.01 = 9-21, we reject the null hypothesis. In 


this instance, the p-value, which can be determined from a chisquare table, is Pr{ x > 
9.73} ~ 0.007. 


6.2.4. Testing и = --- = up, with X known, real Gaussian case 


Let X; ~ Ny(u, X), X > О, j = 1,...,n, and the X; be independently dis- 
tributed. Letting и” = (ит, ..., Up), consider the hypothesis 


Ho : My) = Шо =: = цр =V, 


where v, the common ур; is unknown. This implies that ш; — u; = О for all i and j. Con- 
sider the p x 1 vector J of unities, J^ = (1,..., 1) and then take any non-null vector that 
is orthogonal to J. Let A be such a vector so that A’J = 0. Actually, p — 1 linearly inde- 
pendent such vectors are available. For example, if p is even, then take 1, —1,...,1, —1 
as the elements of A and, when p is odd, one can start with 1, —1,..., 1, —1 and take the 
last three elements as 1, —2, 1, or the last element as O, that is, 


1 1 
1 = = E 
JS sse 1 for p even and A — | ог : | for p odd. 
1 -2 m 
—1 1 


When the last element of the vector A is zero, we are simply ignoring the last element in 
X j. Let the p x 1 vector X; ~ Ny(u, X), X > О, j =1,...,n, and the X ;’s be inde- 
pendently distributed. Let the scalar y; = A’X; and the 1 x n vector Y = (у1,..., Yn) = 
(A'X1,..., A'X4) = A'(X1,..., Xn) = АХ, where the p x n matrix X = (X4,..., Xn). 
Let § = 1i») = AL +--+ Xn) = A'Ž. Then. $7 465 2300; 73) = 
Al УЗ (Xj — X)(Xj — X A where 5 ,(0(—X)(X;-X) = (X-X)X-X) = 5 = 
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the sample sum of products matrix in the X ;’s, where Х = (X ЖО X ) the p x n ma- 
trix whose columns are all equal to X. Thus, one has Via 0; — yy = A'SA. Con- 
sider the hypothesis ш = --- = up = v. Then, A'u = vA'J = v0 = 0 under Ho. 


Since X; ~ N,(u, X), X > О, we have y;  Ni(A'u, A'EZA), A'XA > 0. Un- 
der Hj, yj ~ NiO, A'XA), j = 1,...,n, the yj’s being independently distributed. 
Consider the joint density of y1, ..., уп, denoted by L: 
азаа 
L= г. (0 
j=1 (277)2[A’ X A]? 


Since X is known, the only unknown quantity in L is u. Differentiating In L with respect 
to u and equating the result to a null vector, we have 


n n 
60; = А) 202 M yj -nA' =05 y — А 502 A( — Й) =0. 
j=l j=l 


However, since A is a fixed known vector and the equation holds for arbitrary X, й = Х. 
Hence the maximum of L, in the entire parameter space © = u, is the following: 


e^ зая La- езуд 4 54 
max L — пр 7 = 7 ПЕ (ii) 
Q (Ол)? [A'S A]? Qr) ŽIA SA] 
Now, noting that under H,, A'u = 0, we have 
gaza AA 
max L = р, (iii) 
H, (Ол)? [A' X A] 


From (i) to (iii), the A-criterion is as follows, observing that AX a Xj ХА = 

У АХ; X)(X; — XY A +nA'(XX'A) = A'SA +пА'ХХ'А: 
A—eGvza4XXA (6.2.3) 

But since asa’ X~M (O, 1) under H,, we may test this null hypothesis either by 


using the standard normal variable or a chisquare variable as уут A’ XX'A ~ x? under 
Н,. Accordingly, the criterion consists of rejecting Ho 


> za, with Pr{z > zg} = В, z~ Ni(0, 1) 
or 
n 


vga, 4 XX > XP, with Рг{х > у} = о, ис xf. (624) 


when u = 
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Example 6.2.5. Consider a 4-variate real Gaussian vector X ~ Na(u, X), X > О with 
X as specified in Example 6.2.4 and the null hypothesis that the individual components of 
the mean value vector u are all equal, that is, 


[ 1 0 | Н 
|1201 PIENE UEM 


01 12 U4 


Let L be a 4 х 1 constant vector such that L’ = (1, —1, 1, —1). Then, under H,, L'u = 0 
and u = L'X is univariate normal; more specifically, и ~ № (0, L'X L) where 


2100 1 
ў _ _ _ 1 2 0 dT s. 
INL =t 1 —1]| уу e 1 ||=7= u~ MO, 7). 
01 12||-1 
Let the observation vectors be the same as those used in Example 6.2.4 and let uj = 
L'Xj, j =1,...,5. Then, the five independent observations from и ~ № (0, 7) are the 
following: 
1 —1 0 
0 1 2 
up EX = (1 —1 1 -1] 2 = —1, m = L’ 1 = —3, из = L' 3 = —3, 
4 2 4 
2 
u4 = 1/ E SA ES 
4 m =f j , 5 —_ 0 Du Ы 
4 


е ауегаре и = HOT +.---+4u5) = HS! — 3 — 3 — 3 — 1) being equal to -1, Then, 
the standardized sample mean z = y^ (й — 0) ^ N4(0, 1). Let us test the null hypothesis 
at the significance level a = 0.05. Referring to a № (0, 1) table, the required critical 
value, denoted by Za = 20.025 is 1.96. Therefore, we reject Ho in favor of the alternative 
hypothesis that at least two components of jz are unequal at significance level o if the 
Observed value of 

Js 


n 
Ou 


Z| = | ü -0) > 1.96. 
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Since the observed value of |z| is KE — 0)| = V1.4 = 1.18 is less than 1.96, we do 
not reject H, at the 5% significance level. Letting z ~ № (0, 1), the p-value in this case 
is Pr{|z| > 1.18} = 0.238, this quantile being available from a standard normal table. 


In the complex case, proceeding in a parallel manner to the real case, the lambda cri- 
terion will be the following: 


X — eee XA (6.2a.5) 


where an asterisk indicates the conjugate transpose. Letting й = TU Ti (A*X X * A), it can 
be shown that under Ho, й is distributed as a real chisquare random variable having 2 
degrees of freedom. Accordingly, the criterion will be as follows: 


Reject Н, if the observed й > x; , with Pr{x3 > X2 4) = a. (6.2a.6) 


Example 6.2a.2. When p > 2, the computations become quite involved in the complex 
case. Thus, we will let p — 2 and consider the bivariate complex No(jt, X) distribution 
that was specified in Example 6.2a.1, assuming that X is as given therein, the same set of 
observations being utilized as well. In this case, the null hypothesis is Н, : A1 = д2, the 
parameters and sample average being 


- Ш a 2 1+ї = 1]5—1 
ra eb ТЕ] 


Letting L’ = (1, —1), L’ = О under Н, and 


rare i 1] 53-il laran; Ў) = =a 2042) = 2 
ac bi ao? | ишт, 
» 2 qaii 1 2n =, 8 5 5 
Lp =A | nd qut qM жш, 

[ ll 4 |1 v= т) = у х те = С 


The criterion consists of rejecting H, if the observed value of v > xt о: Letting the 
significance level of the test be о = 0.05, the critical value is Xs. 005 = 5.99, which is 
readily available from a chisquare table. The observed value of v being 2 < 5.99, we do 
not reject Ho. In this case, the p-value is Prix > 2} x 0.318. 


6.2.5. Likelihood ratio criterion for testing H, : ш = --- = Up, X known 


Consider again, X; ~ Np(u, 2), X > О, j = 1,...,n, with the Ху being 
independently distributed and 27, assumed known. Letting the joint density of X1,..., Xn 
be denoted by L, then, as determined earlier, 

e 23002715-50t- 0) ХХ-и) 
Lac a (0 
(2л) 2 |X|? 


412 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


where n is the sample size and S is the sample sum of products matrix. In the entire 
parameter space 


О = {(u, X)| X > О known, pw’ = (и1,..., Up)}, 
the MLE of u is X = the sample average. Then 


e7 ztr(27!S) 
max L = 


= n n- * Qi) 
Q (2л)? ||? 


Consider the following hypothesis on w’ = (и1,..., м): 
Н: ш = +++ = цр = v, Vv is unknown. 


Then, the MLE of u under H, is й = Jd = J2J'X, J’ = (L,..., 1). This 0 is in fact 
the sum of all observations on all components of X;, j = 1,..., п, divided by np, which 
is identical to the sum of all the coordinates of X divided by p or ft = а J'X. In order to 
evaluate the maximum of L under Hp, it suffices to substitute Å to m in (i). Accordingly, 
the A-criterion is 

= BEL _ eiA- ITR- (6.2.5) 

maxg L 

Thus, we reject H, for small values of A or for large values of ш = n(X — py aD (X — f). 
Let us determine the distribution of v. First, note that 


= " 1 “ 1 B 
X—n-X-—-—JJ'X (I, ——JJ')X, 
u p (Ip p ) 


and let 

v ^v wv-—l,/v ^ v/ 1 / —1 1 hy Zr 

w=n(X — р) 5 (X—-gu)—-nX(I——JJ)X (1——JJ»X (iii) 

р р 
V / 1 / —1 1 ry 
=(Х- и) Ud -—JS)X (1——У7)(Х — и) 
p p 
since /' (1 — JJ’) = О, u = vJ being the true mean value of the №, (ш, X) distribution. 
Observe that Jn(X = р) ~ N,(O, X), X > О, and that aid is idempotent. Since 


I- J J' is also idempotent and its rank is p — 1, there exists an orthonormal matrix P, 
PP'—I, Р'Р = І, such that 


1 / / Ip-i O 
r-3 = P| Б 
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Letting U = P./n(X — f), with U’ = (ui, ..., ир-1,ир), U ~ №(0, РУР”). Now, 
on noting that 


Iı О 
4 2 | = (u1, ‚++, Ир—]1, 0), 
we have 


Ui 


n(X — Д) ХХ — й) = [Uj], 0] P X7! P’ | 0 


|- UB Ui, Uj = (ші,...,ир-1), 


B being the covariance matrix associated with U1, so that U; ^ Ny-1(O, B), B > О. 
Thus, у В "Uu ~ ee a real scalar chisquare random variable having p — 1 degrees of 
freedom. Hence, upon evaluating 


v/ 1 / —1 1 NET 
w-—X(I——JJ)X (1——JJ)X, 
Р р 
one would reject Ho : ш = ··· = Up = v, v unknown, whenever the observed value of 


ш> дэ 


pgm Pry. 4 2, ааа (6.2.6) 
Observe that the degrees of freedom of this chisquare variable, that is, p — 1, coincides 
with the number of parameters being restricted by Ho. 


Example 6.2.6. Consider the trivariate real Gaussian population X ^ N3(u, 27), X > 
O, as already specified in Example 6.2.1 with the same 2 and the same observed sample 
vectors for testing Ho : и’ = (v, v, v), namely, 


p . 1/9 200 ; 0 
u= ш |,Х=|4|,®=|0 1 1|>2=]|0 2 —1 
из 3 012 0-1 1 


The following test statistic has to be evaluated for p — 3: 


_y/ 1 / —1 1 үү RES 
w-—X(I--JJ)Xxk 1——-JJ)X, J = (1,1,1). 
Р Р 
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We have to evaluate the following quantities in order to determine the value of w: 


110 1 1 
ЕА 1{1 РРО 
re zo 2 Ec ECT d 
110 000 
3 3 3 
23 3 3 1 11 
1 1 1|2 2 2 
Уі Ју = [ 3 |- E 
3 ME 3 53 6 | 
5 5 % 111 
1 1 | Sal) yf 
; 0 0 112 19| 4.1223 
0 11-515 1 0|73 1 1 1 
0 —1 1 110 000 
111 D wd wg 
1 3 1 1 
+-]1 1 1} 2|-4 е. =(1- JJ) ХІ =ЈЈ). 
6|111 0-2 1 У 2 
6 6 


Thus, 


Y 1 / —1 1 vy bo 
w= XU SISNET = 34J)X, J = 0,110) 
1 


1 1 0)]f[9 
= [9 4 3] = : "i 4 
D WE In 
LE 1102 30 an jut 
= [9° +40; +3°=— + @@) — 00) 


= 0.38. 


We reject H, whenever w > aom id Letting the significance level be о = 0.05, the 
tabulated critical value is Хо m X; 0.05 = 5.99, and since 0.38 — 5.99, we do not 
reject the null hypothesis. In this instance, the p-value is Pr{ х2 > 0.38} ~ 0.32. 


6.3. Testing H, : и = Ho (given) When 27 is Unknown, Real Gaussian Case 


In this case, both u and X are unknown in the entire parameter space ©; however, 
и = Ho known while X is still unknown in the subspace о. The MLE under © is the same 
as that obtained in Sect. 6.1.1., that is, 


2 
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When и = ио, X is estimated by X = 137 10 — LHo)(Xj — Hoy. As shown in 
Sect. 3.5, Ӯ; can be reexpressed as follows: 


| ae : | 
= po Б, к» т=ш= Ho) 


1 = = 
= =15 + п(Х md Ho) CX = Hoy]. 


Then, under the null hypothesis, we have 


"ao: v3 
sup, L = i = = >. (6.3.2) 
(27) 2 |S +n(X — uo)(X — ш„)/'|? 
Thus, 
sup,L _ E 


supoL  |S--n(X — uo(X — шо). 


On applying results on the determinants of partitioned matrices which were obtained in 
Sect. 1.3, we have the following equivalent representations of the denominator: 


S n(X m ш) _ V V / 
mm | || = tts en Ho) — Ho) 
= |S] |1 + n(X — to) S^ (X — ио), 
that is, 
|S + n(X = u9)(X — uoy| = ISIO + n(X — no) 5 (X — po)l. 
which yields the following simplified representation of the likelihood ratio statistic: 


1 
^ [A = asi up 


А (6.3.3) 


Small values of A correspond to large values of u = n(X — Hay 8-Х — po), Which is 
connected to Hotelling's T statistic. Hence the criterion is the following: “Reject Ho for 
large values of u”. The distribution of u can be derived by making use of the indepen- 
dence of the sample mean and sample sum of products matrix and the densities of these 
quantities. An outline of the derivation is provided in the next subsection. 
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6.3.1. The distribution of the test statistic 


Let us examine the distribution of u = n(X — uy S-!(X — ш). We have already 
established in Theorems 3.5.3, that S and X are independently distributed in the case of 
a real p-variate nonsingular Gaussian Np(u, X) population. It was also determined in 
Corollary 3.5.1 that the distribution of the sample average X is a p-variate real Gaussian 
vector with the parameters u and it , & > O and in the continuing discussion, it is 
shown that the distribution of S is a matrix-variate Wishart with т = n — 1 degrees of 
freedom, where n is the sample size and parameter matrix 2 > О. Hence the joint density 
of S and X, denoted by f (S, X), is the product of the marginal densities. Letting X = I, 
this joint density is given by 


p 
= п? т +1 1 п " " / 
fS, X) ——7 IS] 2 25 e ztr(S) zt((X ш)(Х—и) ), т=п— 1. (i) 


С Ол г,(") 


Note that it is sufficient to consider the case X = Т. Due to the presence of 5 -l inu = 
(X — ш) 5-1(Х — u), the effect of any scaling matrix on X j will disappear. If X ; goes 
to A2X j for any constant positive definite matrix A then S =! will go to A735-1A7? and 
thus u will be free of A. 

Letting Y — ge» (X — 2) for fixed S, Y ~ Ny(O, S-1/n), so that the conditional density 
of Y, given S, is 


g(Y|$) = — re iv, 
(27) 2 


Thus, the joint density of S and У, denoted by /1(5, Y), is 


| 
— ua 
NIE 


n2 т+1 +1 1 / 
AG, Y)2 —— |S| 2 — 2 e-3€GURYYD m=n-1. (i) 


(Оло pun) 


On integrating out S from (ii) by making use of a matrix-variate gamma integral, we obtain 
the following marginal density of Y, denoted by f2(Y): 


р 
5 r т+1 " 
(л)2 4 pla 


Р(Ү)аү = 


However, |] + nYY'| = 1 + nY'Y, which can be established by considering two represen- 
tations of the determinant 
—A/nY 


I 
Vin 1 


* 
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similarly to what was done in Sect. 6.3 to obtain the likelihood ratio statistic given in 
(6.3.3). As well, it can easily be shown that 


DOE) Te 
D) ГО р) 


by expanding the ды чш ae functions. Now, letting s = Y'Y, it follows from 
Theorem 4.2.3 that dY — "5 527105. Thus, the density of s, denoted Бу 3 (5), is 
(5) 


р 
а p] m41 
3(s)ds = 52-1 4 ns) ©? ds 6.3.4) 
; ГО — Hr) | 
р 
п2Г(5 n 
= (2) 2 sty + ns) 2 ds, m=n-—l, (6.3.5) 
г ГО) 
forn = p+1, р+2,..., 0 < 5 < 00, апа zero elsewhere. It can then readily be seen 
from (6.3.5) that ns = nY'Y = n(X — u)'S-! (X — и) = и is distributed as a real scalar 
type-2 beta random variable whose parameters are (5, 5 — 4), n = p+1,.... Thus, the 


following result: 


Theorem 6.3.1. Consider a real p-variate normal population N,(u, X), X > О, and 
a simple random sample of size n from this normal population, X; ^ Np(u, X), j = 
l,...,n, the X;j's being independently distributed. Let the p x n matrix X = (X1,..., 
Xn) be the sample matrix and the p-vector X = Ho + --- + Xn) denote the sample 


average. Let X — (X,..., X) bea p x n matrix whose columns are all equal to X, 
апа S = (X — Х)(Х — Xy be the sample sum of products matrix. Then, u — n(X — 
i) STIX — u) has a real scalar type-2 beta distribution with the parameters (5, 5 т — 5), 


so that u ~ = p,n—p Where Fp, n—p denotes а real F random variable whose Ий 
of freedoms are p andn — р. 


Hence, in order to test the hypothesis Н, : и = Ho, the likelihood ratio statistic gives 
the test criterion: Reject H, for large values of u = n(X — Mo) S (X — uo), which is 
equivalent to rejecting H, for large values of an F-random variable having p and n — p 
degrees of freedom where Fp, n-p = 2и = za n(X — uo) ST! (X = uo), that is, 


p 
reject Ho if” ZE 


u = Fp, n- p Z ОТВ p,a» 


with œ = Pr{Fp,n-p = Fn,n-p, a} (6.3.6) 
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at a given significance level а where u = n(X — uo) S. | (X — uo) ~ Fp, пр, n being 


й BOR 
the sample size. 


Example 6.3.1. Consider a trivariate real Gaussian vector X ^ N3(u, X), X > О, 
where X is unknown. We would like to test the following hypothesis on u: Ho : и = Ho, 
with и” = (1, 1, 1). Consider the following simple random sample of size n = 5 from this 
№ (и, X) population: 


1 1 —1 —2 
Х| = 1 ‚ Хә = 0 , Хз = 1 , Ха = 1 ‚ X5 = — 1 А 
1 —1 2 2 
so that 
ME 1 1 1 1 1 4 4 
X-—-|2|,Xi-X-2|1|—-z2|2|!--21|3|, -Х=- | -2 |, 
514 1. 2°|4| [4 —9 
ғ) 1 —6 Е 1 —]1 9 
6 6 —4 


Let X = [X], ..., Xs] the 3 x 5 sample matrix and X = [X, X,..., X] be the 3 x 5 matrix 
of sample means. Then, 


_ _ С | 
Ерна Кра gum. oss 
pig Jed 
| | 270 —110 —170 
S-(X-X(X-X/-.4| -110 80 85 
5^. 170 85 170 


Let S = $A. In order to evaluate the test statistic, we need $^! = 254A-!. To obtain 
the correct inverse without any approximation, we will use the transpose of the cofactor 
matrix divided by the determinant. The determinant of A, |A|, as obtained in terms of the 
elements of the first row and the corresponding cofactors is equal to 531250. The matrix 
of cofactors, denoted by Cof(A), which is symmetric in this case, is the following: 


6375 4250 4250 5 
Cof(A) = | 4250 17000 —4250 xc MC 
4250 —4250 9500 


Cof(A). 
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The null hypothesis is Ho : и = Uo = (1, 1, 1)’, so that 


#-ш=, А — |1 -- Н | 
4 1 
the observed value of the test statistic being 
4 
w = РЯ – po 8 – uo) = w з 147! |3 
1 


T 2. 25 
37 52 531250 


[(4)206375) + (3)? (17000) + (1)? (9500) 


+ 2(4)(3)(4250) + 2(4)(1)(4250) — 2(3)(1)(4250)] 
_2 2 229 25 1 
~ 3^ 52531250 
The test statistic w under the null hypothesis is F-distributed, that is, w ~ Fp n—p. Let 


us test H, at the significance level о = 0.05. Since the critical value as obtained from ап 
F-table is Fy n—p,w = F3,2,0.05 = 19.2 and 2.35 < 19.2, we do not reject Hp. 


[375000] = 2.35. 


Note 6.3.1. If 5 is replaced by - 5. an unbiased estimator for 27, then the test statistic 


E T2 
4n — Ho) [4 i9]. (X= Ho) = T, т Where Т2 denotes Hotelling’s T? statistic, which 
for p = 1 corresponds to the square of a 3 Student- t statistic having n—1 degrees of freedom. 


Since и as defined in Theorem 6.3.1 is distributed as a type-2 beta random variable with 
the parameters (5, 757), we have the following results: + is type-2 beta distributed with 
the a (252, 2) 


is ipe: 1 beta distributed with the parameters (5, 777), 


€ ‚ 2),n being the sample size. 


а TH 
and 


is 


EET 


6.3.2. Paired values or linear functions when » is unknown 


Let Yj,..., Ү be p x 1 vectors having their own distributions which are unknown. 
However, suppose that it is known that a certain linear function X = a, Y, +: - - Рак has 


a p-variate real Gaussian №, (и, z ) ed with X > О. We would like to test hy- 
(0); 


potheses of the type E[X] = aq Fakt) where Шеш 8,j = l,...,K,arespec- 
ified. Since we do not know the distributions of Y1, ..., Yk, let us convert the iid variables 
on Y;, j =1,...,k, to iid variables on Ху, say Хү... Xy. X; Ny(u, X), X > О, 
where X is unknown. First, the observations on Y,,..., Y, are transformed into observa- 


tions on the X ;’s. The problem then involves a single normal population whose covariance 
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matrix is unknown. An example of this type is Y; representing a p x 1 vector before a cer- 
tain process, such as administering a drug to a patient; in this instance, Y; could consists of 
measurements on p characteristics observed in a patient. Observations on Y» will then be 
the measurements on the same p characteristics after the process such as after administer- 
ing the drug to the patient. Then Y?, — Үд = X, will represent the variable corresponding 
to the difference in the measurements on the q-th characteristic. Let the hypothesis be 
Н, : ш = ие (given), X being unknown. Note that once the observations on X ; are taken, 
then the individual 4j)'s are irrelevant as they no longer are of any use. Once the X ;’s 
are determined, one can compute the sum of products matrix S in Ху. In this case, the 
test statistic is и = n(X — uo) S- (X — uo), which is distributed as а type-2 beta with 
parameters (5, 757). Then, и ~ "rd where F is an F random variable having p and 
n — p degrees of freedom, that is, an Fy, „р random variable, n being the sample size. 
Thus, the test criterion 1s applied as follows: Determine the observed value of u and the 
corresponding observed value of Ё, n-p that is, E u, and then 


n — 


reject Hy if ——" u > Fp n-p, a, With Pr{Fp.n—p > Fp,n-p a} =0. — (637) 


Example 6.3.2. Five motivated individuals were randomly selected and subjected to an 
exercise regimen for a month. The exercise program promoters claim that the subjects can 
expect a weight loss of 5 kg as well as a 2-in. reduction in lower stomach girth by the end 
of the month period. Let Y; and Y? denote the two component vectors representing weight 
and girth before starting the routine and at the end of the exercise program, respectively. 
The following are the observations on the five individuals: 


Qu) (85, 85) (80, 70) (75,73) (70, 71) (70, 68) 
b72 = | (40,41) |^ | (40,45) |^ 106,309 |^ [08,39 ^ | 05,32) | 


Obviously, Y; and Y» are dependent variables having a joint distribution. We will as- 


sume that the difference X = Yı — Y»? has а real Gaussian distribution, that is, X ~ 
No(u, X), X > О. Under this assumption, the observations on X аге 


= [8] L1] D) D] E] 6 


Let X = [X1, X5, ..., X5] and X — X = [Xi РОИ Xs — X], both being 2 x 5 matrix. 
The observed sample average X, the claim of the exercise routine promoters и = [Mo as 
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well as other relevant quantities are as follows: 


a 1 1 
E | 
, 0 1 
xi-x-| %]- ЧЕ :| o T 
1 
Te 


_ : =, 1[-13 37 -3 -18 -3 
Nope 0 —20 5 5 ak 


37 —20 
- = 1 | -13 37 —3 -18 -3 
— / — Гарам 
5-Х 5 | 0 —0 5 5 | 3 5 


—18 5 
—3 10 
spe sao] et hm из эю] = 254% 
|A| = 1880(550) — (875)? = 214975; Cof(A) = E ru ; A= BE 
^ seus m] 5o 


Ӯ _ 1 13 _ 5 _ _1 12 
Horses or раа 
The test statistic being w — “Рп — Ho) S- (X — ро), its observed value is 
400) se. d 550 875][12 
2 ^3 268375 [12 15] 875 1880] | 15 
3 


~ 2 268375 


Letting the significance level of the test be а = 0.05, the critical value is Fy „р, о = 
Рз, 0.05 = 9.55. Since 22.84 > 9.55, Н, is thus rejected. 


[(12)2(550) + (15)2(1880) + 2(12)(15)(875)] = 22.84. 


6.3.3. Independent Gaussian populations 


Consider k independent p-variate real Gaussian populations whose individual distri- 
bution is №, (шу), Xj), Ej > О, j= 1,..., К. Given simple random samples of sizes 
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n1, ..., nk from these k populations, we may wish to test a hypothesis on a given linear 
functions of the mean values, that is, Н, : aima) +++: + аки) = Шо where а, ..., ак 
are known constants and у, is a given quantity under the null hypothesis. We have already 
discussed this problem for the case of known covariance matrices. When the 2/;'s are all 
unknown or some of them are known while others are not, the MLE's of the unknown co- 
variance matrices turn out to be the respective sample sums of products matrices divided 
by the corresponding sample sizes. This will result in a linear function of independent 
Wishart matrices whose distribution proves challenging to determine, even for the null 
case. 


Special case of two independent Gaussian populations 


Consider the special case of two independent real Gaussian populations having identical 
covariance matrices. that is, let the populations be Yi, ^ М, (ш, X), X > О, Ше Үл, 
а = l...,n,, being iid, and Yo, ev Мо), У), X > О, the Yq’S, q =1,...,пә, 
being iid . Let the sample p x n; and p x n» matrices be denoted as Yj = (Yi1,..., Yi) 
and Y»? = (Y21,..., Y2n,) and let the sample averages be Y; = л (л Tee Yin;), J= 


1,2. Let Y; = (Уз. УУ), ар х nj; matrix whose columns are equal to Ven J = 1,2, 
and let 
$;2 (Y;—-Y)QY;- Y), 7 = 1,2, 


be the corresponding sample sum of products matrices. Then, Sı and 52 are independently 
distributed as Wishart matrices having n, — 1 and n2 — 1 degrees of freedom, respectively. 
As the sum of two independent p x p real or complex matrices having matrix-variate 
gamma distributions with the same scale parameter matrix is again gamma distributed 
with the shape parameters summed up and the same scale parameter matrix, we observe 
that since the two populations are independently distributed, S$; + $5 = S has a Wishart 
distribution having nı + n2 — 2 degrees of freedom. We now consider a hypothesis of the 
type шт) = шо). In order to do away with the unknown common mean value, we may 
consider the real p-vector U — Y; — Yo, so that E(U) — O and Cov(U) — LE + LE = 
ез + HS = TL Y . The MLE of this pooled covariance matrix is xS where S 
is Wishart distributed with nı + n2 — 2 degrees of freedom. Then, following through the 
steps included in Sect. 6.3.1 with the parameter m now being nı + n2 — 2, the power of 
S will become Quim HD — ptt when integrating out S. Letting the null hypothesis be 
H, : E[Y1] — Е] = ô (specified), such as ё = 0, the function resulting from integrating 
out S is 


—d(ni4-n3—1) 
Е +n) ag P (ni n] z Quem 
nin» nin» 


(6.3.8) 
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where c is the normalizing constant, so that w = eum ey. 8) SY, == 
ô) is distributed as a type-2 beta with the parameters (5, твор). Writing ш = 


aan cia entm- this F is seen to be an F statistic having p and nı + n2 — 1 — p 


degrees of freedom. We will state these results as theorems. 
Theorem 6.3.2. Let the p x p real positive definite matrices Ху and Хэ be independently 
distributed as real matrix-variate gamma random variables with densities 

[В|°/ 
Гр (@j) 


J=1,2, and zero elsewhere. Then, as сап be seen from (5.2.6), the Laplace transform asso- 
ciated with X ; or that of f;, denoted as Lyx, GT), is 


= 
fX) = — 1x0 "Fe "@X), B> O, X; - О, (ap > “= (6.3.9) 


Lx,GT) =M B ,T[, 1+ ВТ > O, j =1,2. (i) 


Accordingly, О = Ху + X» has a real matrix-variate gamma density with the parameters 
(a; + o9, B) whose associated Laplace transform is 


Lu, GT) = И + BLT 192, (ii) 
and Uz = a, Ху + a2X2 has the Laplace transform 
Ly, GT) = M +a BT] + aB T, (iii) 
whenever I + aj;B,T > О, j = 1, 2, where a, and az are real scalar constants. 


It follows from (ii) that Ху + X» is also real matrix-variate gamma distributed. When 
a, # an, itis very difficult to invert (iii) in order to obtain the corresponding density. This 
can be achieved by expanding one of the determinants in (iii) in terms of zonal polynomi- 
als, say the second one, after having first taken |Z + a1 Bl, T|- (1*9? out as a factor in 
this instance. 


Theorem 6.3.3. Let Y; ~ №, (ир, 2), 2 > О, j = 1,2, be independent p-variate 
real Gaussian distributions sharing the same covariance matrix. Given a simple random 
sample of size nı from Y, and a simple random sample of size n» from Y», let the sample 
averages be denoted by Y, and Ү and the sample sums of products matrices, by Sı and S5, 
respectively. Consider the hypothesis Н, : ца) — шо) = ô (given). Letting S = Sı + S5 
and 


nı +n = 2 - Е 
— +") Ly; дуду КО A 
nina ni t+nz—1—p 


Fy, nj+n2—1—p (iv) 
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where Fp, пу-+п»—1—р denotes ап F distribution with p and nı + n? — 1 — p degrees of 
freedom, or equivalently, w is distributed as a type-2 beta variable with the parameters 
(5. La We reject the null hypothesis Ho арр > > ЕЁр,пү+п»—1—р,‚ a With 


Pr{ Fp, ni+n-1-p = < Fp ‚„п+1+п2=1—р, a} =a (vi) 
at a given significance level a. 


Theorem 6.3.4. Let w be as defined in Theorem 6.3.3. Then шу = 1 is a real scalar 


type-2 beta variable with the parameters (ee B) w2 = тү їз а real scalar 
type-1 beta variable with the parameters (5, о 1-р), шз = тг is a real scalar 


type-I beta variable with the parameters gmp, P). 


Those last results follow from the connections between real scalar type-1 and type-2 
beta random variables. Results parallel to those appearing in (i) to (vi) and stated Theo- 
rems 6.3.1—6.3.4 can similarly be obtained for the complex case. 


Example 6.3.3. Consider two independent populations whose respective distributions 
аге No(u(y, X1) and №(иоу), 22), X; > O, j = 1,2, and samples of sizes п = 
4 and nz = 5 from these two populations, respectively. Let the population covariance 
matrices be identical with Z4 = X2 = У, the common covariance matrix being unknown, 
and let the observed sample vectors from the first population, X; ~ № (ил), X), be 


n-e- E] e-e- 


Denoting the sample mean from the first population by X and the sample sum of products 
matrix by S1, we have 


"EE И _ i | 
ia H and Sı = (X - )K - X, X = [Xi, X2, X3, Ха], X = IX, .... X], 


Hypothesis Testing and Null Distributions 425 
the observations on these quantities being the following: 
si [en] 
БЕЕН Е 
1 


2а о A =A a> =A Mim. est Os 
x-X-i[5 uM PEE XX – X) 


Let the sample vectors from the second population denoted as Үү, ..., Ys be 


n-p eE ei] 


Then, the sample average and the deviation vectors are the following: 


ЖЕШКЕ 


i 2 ETO 
5 j| 571-315]. 
_ if 9 a ke HO б E 
ананан! 
- If -5 0 005 ү Y) 
ЗЕН Z. F. =7 3 1 сл а. 
Loeb? ® 1 [64 32] 1[50 о 
%= 3) АКЕГЕ ARRA m 
_ [6.00 2.00 -1 _ Cof(S) 
— 12.00 8.20 {S| 
1 8.20 —2.00 
= Ое ба 
ISI = 45.20; 5—1 = = Е ce]: 


Letting the null hypothesis be 


1 
Ho : Ma) — но) = ê = E З 
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а oe 1/0 1 [5 1 2.0 
ЕТЕК 


Thus, test statistic is и ~ Fy пуп 1-р where 


"e (unm 1-р) (тт) е $ еле» р 5) 


p nin» 
_ @+5—1—2)@-+5) 1 82 —20][—2.0 
ni 2 (4)(5) 45.2 к Е | ЕЧ 


x 0.91. 


Let us test H, at the 5% significance level. Since the required critical value is 
Fy, ni+m-—1-p, a = F2,6, 0.05 = 5.14 and 0.91 < 5.14, the null hypothesis is not rejected. 


6.3.4. Testing ш = --- = ш when X is unknown in the real Gaussian case 


Let the p x 1 vector X; ~ Ny(u, X), X > О, j =1,...,n, the X;’s being inde- 
pendently distributed. Let the p x 1 vector of unities be denoted by J or J’ = (1,..., 1), 
and let A be a vector that is orthogonal to J so that A'J — 0. For example, we can take 


1 1 
=f | —1 
А= : when p is even, A = | or А = : when р is odd. 
—2 -1 
—1 | 0 


If the last component of A is zero, we are then ignoring the last component of Xj. 
Let yj = A'Xj, j = L...,n, and the yj’s be independently distributed. Then 
yj ~ Ni(A'u, A'ZA), A'EZA > О, is a univariate normal variable with mean value 
A'u and variance А” X A. Consider the p x n sample matrix comprising the X ;’s, that is, 
X = (Xj,..., Xn). Let the sample average of the X j's be X= 04 occ Xn) and 
X — (X,..., X). Then, the sample sum of products matrix 5 = (X—X) (X-X) . Consider 
the 1 x n vector Y = (yj,..., Yn) = (A'X1, ..., A'X,) = АХ, у= *(y1 +++ + yn) = 
A'X, У" 10у = y? = A(X — Х)(Х – X/'A = A'SA. Let the null hypothesis be 
Ho : ду = +++ = np = v, where v is unknown, wW = (Hi, ..., Hp). Thus, Ho is 
A'u = vA'J = 0. The joint density of y1, ..., Yn, denoted by L, is then 
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1 2 
eTO TAM? ет Ет oe 10 


п 

L= lI 1 1 E п у n 

jai Q022[A'Z A]? (22x)2[A' X A]? 
e zv ra A SA+nA'(X—u)(X—-u)' A) 

= (i) 


i Qz)?[A' ZA]? 


where 
6j - Aw? = 30; $3 - A'u = 320; - Y 9 nG – А)? 
j=l j=l j=l 
= > АХ; — X)(X; — XY A +nA'(X — (X — uyA 


j=l 
= A'SA - nA'(X — Ш)(Х — uy A. 


Let us determine ће MLE's of u and X. We have 


ЭАТ А 5А + n(A'(X — ш)(Х — u)'A)]. 


On differentiating In L with respect to р and equating the result to zero, we have 


InL = —=(2л)—  InA'ZA 
EE 2 


д Q^ е = B 
— lnL = 0 = nA TS XX — Xu — uX' + ри А = О 
[m 


дш 
1 1 
/ v 0 v/ 0 / 
= nA[-X[1,0,...,0]— | | X’+]. | w+ u[1,0,...,0]A-2 O 
0 0 
—2a,A(X—pu)-202fü-xX (ii) 


since the equation holds for each и}, j = 1,..., р, and A’ = (qa,..., ар), а; #0, J = 
1,..., p, A being fixed. As well, (X — w)(X — uy = XX! — Xp’ — uX' + uw. Now, 
consider differentiating In L with respect to an element of X, say, 011, at fj = Х: 
ЕЕ №2 = 0 
001] 
2n a?o A'SA 


Iq жш орауы 
IAEA (аА) (241911) 


^ 1 
=> A'SA=—A'SA 
n 
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for each element in X and hence © = is . Thus, 


-$n? 
max L = — ——;,. (iii) 
Q (А' 5А)? 
Under Н, A'u = 0 and consequently the maximum under Н, is the following: 
e^?n3 
max L — — = (iv) 
Ho [A’(S 4- nX X)A]2 
Accordingly, the A-criterion is 
A'SA)2 1 
сз o ас з (6.3.10) 
[AS-EnXX)A]: палу 


A'SA 

We would reject Н, for small values of А or for large values of u = nA’ XX'A/A'SA 
where X and 5 are independently distributed. Observing that $ ~ W,(n—1, X), X > О 
and X ~ N,(u, +X), E > О, we have 

n Lore 2 A'SA 2 

A ra^ XX A ~ Xi and АУА ^" Xn—1: 

Hence, (n—1)u is a F statistic with 1 and n — 1 degrees of freedom, and the null hypothesis 
is to be rejected whenever 


= р ae ith Pr(F > F = 6.3.11 
v = п(п — ASA. > Fi,n-1, a, With Pr{Fin-1 > Fi,n-1, o} =œ. (63.11) 
Example 6.3.4. Consider a real bivariate Gaussian № (ш, 27) population where X > 
О is unknown. We would like to test the hypothesis Ho : ш = u2, ш = (ш, иэ), 
so that ш — u2 = О under this null hypothesis. Let the sample be X1, X», Хз, X4, as 
specified in Example 6.3.3. Let A’ = (1, —1) so that A'u = O under Н,. With the 
same observation vectors as those comprising the first sample in Example 6.3.3, А'Х = 
(1), A'X5 = (—2), A'X5 = (-1), A'X4 = (0). Letting у = A'X ;, the observations on 
y; are (1, —2, —1, 0) or A'X = АХ, Xo, Xa, Ха] = [1, —2, —1,0]. The sample sum 
of products matrix as evaluated in the first part of Example 6.3.3 is 


1 [64 32 Я 1 64 32][ 1 80 
з= x» ЕЕГ Р | MEL 


Our test statistic is 
— ( = ра | Е 
0 = п(п ^ә af; a=. 
Ue 1,п—1 
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Let the significance level Бе a = 0.05. the observed values of A'X X'A, A'51A, v, and 
the tabulated critical value of А n—1, о are the following: 


= 80 
AU YU = AAS Is 
4 16 
1 
j= 4(3)(——) ОЕ ua sms ques 10.13. 
5x4 ке, E 


As 0.6 < 10.13, H, is not rejected. 
6.3.5. Likelihood ratio test for testing Ho : ш = --- = и when X is unknown 


In the entire parameter space © of a V, (ji, X) population, u is estimated by the sample 
average X and, as previously determined, the maximum of the likelihood function is 


пр np 
e Zn? : 
max L — mD n (i) 

M (27) 2 |S|2 
where S is the sample sum of products matrix and n is the sample size. Under the hypothe- 
sis Но: Ш = ++: = Шр = v, where v is unknown, this v is estimated by 0 = Dr 24 уй = 


LyX, Ј = (1,..., D), the p x 1 sample vectors X, iU TER C DM 
being independently distributed. Thus, under the null hypothesis H,, the population co- 
variance matrix is estimated by 1(5 + n(X — Д)(Х — f), and, proceeding as was done 
to obtain Eq. (6.3.3), the A-criterion reduces to 


S n 
= : | 1 — —3À (6.3.12) 
|S + n(X — p)(X — ш) |2 
1 В _ 
ры (ъп), (6.3.13) 
(1+ и)2 


Given the structure of и in (6.3.13), we can take the Gaussian population covariance matrix 
У to be the identity matrix /, as was explained in Sect. 6.3.1. Observe that 


^ V 1 RYAN v/ 1 / x 

(X — py = (X = =-ЈЈ XY = ХИ - -J |] (ii) 
р р 

where / — LJ J' is idempotent of rank p — 1; hence there exists an orthonormal matrix 


P, РР' = I, P'P = 1, such that 


1 
1— "ul =P p | P3 


- Е 1 _ Е 
МЎ - йу = МАЎ 10) = nk! P E 0 | = [01,0], V = vax’P, 
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where У is the subvector of the first р — 1 components of V. Then the quadratic form и, 
which is our test statistic, reduces to the following: 


Vi 


и = п(Х — fy s (X — à) = [V,0]5' Ё 


_ $11 512 
= Е ME 


We note that the test statistic и has the same structure that of и in Theorem 6.3.1 with p 
replaced by p — 1. Accordingly, и = n(X — fi) S-! (X — fi) is distributed as a real scalar 
type-2 beta variable with the parameters pot and =) so that ари ~ Fp—1,n-p+1: 


Thus, the test criterion consists of 


| = Vi S!!! V], 


ae . n—p-cl 
rejecting H, if the observed value of a ай > Fp-1, п-р+1, a> 
with Pr{Fy-1,n—p41 > Fp—1,n-p+1, a} = a. (6.3.14) 


Example 6.3.5. Let the population be № (џи, X), X > О, w = (ш, иә) and the null 
hypothesis be Н, : ш = иә = v where v and X are unknown. The sample values, as 
specified in Example 6.3.3, are 


«- [s EB E] si 


The maximum likelihood estimate of u under Ho, is 
1 = 
й=—71/'Х, J = H | 
р 1 
апа 


Х-йу=Х'[1 y= T RATS. 1 1 
(Х-и) = = ]- i-i 1)1= 4l- ]. 


As previously calculated, the sample sum of products matrix is 


б — 1 [4 32] са 16 [ 80 -32]_ 1 [ 80 32]... 4,5 
1= 242 |32 80 1 4006 [32 64| 256|-32 64| 7S PSS 
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The test statistic v and its observed value are 


Op cum ЕО СЕ a 
= "AST E — A) ~ Ку-а—ры = Fus 


4-2+1D) Til 80 —32][—1 
-^g-p Op rel} j| $ oral al 
OD ine E PA 
= узу D EO + 74) - 282-00) 


= 0.61. 


At significance level о = 0.05, the tabulated critical value F1,3, 0.05 is 10.13, and since 
the observed value 0.61 is less than 10.13, Н, is not rejected. 


6.4. Testing Hypotheses on the Population Covariance Matrix 


Let the p x 1 independent vectors X;, j — 1,...,n, havea p-variate real nonsingular 
N (и, X) distribution and the p x n matrix X = (Xj,..., Xn) be the sample matrix. 
Denoting the sample average by X= 1(Xı +---+ X4) and letting X= (X, ied Х), еасһ 
column of X being equal to X, the sample sum of products matrix is $ = (Х—Х)(Х—Х)/. 
We have already established that S is Wishart distributed with m = n — 1 degrees of 
freedom, that is, $ ~ W,(m,X), X > О. Letting S, = (X — M)(X — M) where 
М = (u,..., м), each of its column being the p x 1 vector џи, Sa ~ Wp(n, X), X > О, 
where the number of degrees of freedom is n itself whereas the number of degrees of 
freedom associated with S is m = n — 1. Let us consider the hypothesis Н, : X = X, 
where 270 is a given known matrix and u is unspecified. Then, the MLE's of jz and X in 
the entire parameter space are й = X and Š = 15 , and the joint density of ће sample 
values X1,..., Xn, denoted by L, is given by 


ju Lolo eoim! - E (6.4.1) 
(2л) ||? 


Thus, as previously determined, the maximum of L in the parameter space Q = 
(Си, 2)| E > О} is 


n2e 2 
max L = TIR (i) 
е (27) 2 |S|2 
the maximum of L under ће null hypothesis Н, : X = Xo being given by 
e 2005) 
тах L = ——_,,——,. (ii) 


np n 
Ho (27) 2 Pn 
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Then, the A-criterion is the following: 


np 


е2 п 1 — 
kc: ау (6.4.2) 
n2 
Letting u = xm 
eP = 
и= 1518 e am (6.4.3) 


and we would reject Ho for small values of и since it is a monotonically increasing func- 
tion of A, which means that the null hypothesis ought to be rejected for large values of 
(2715) as the exponential function dominates the polynomial function for large val- 
ues of the argument. Let us determine the distribution of w — (27-15 ) whose Laplace 
transform with parameter s is 

Ly(s) = E[e?"] = E[e "05 57. (iii) 


This can be evaluated by integrating out over the density of S which has a Wishart distri- 
bution with m = n — 1 degrees of freedom when u is estimated: 


1 т_Р+1 o lecy-le у] 
Ly (s) = / IS|2- ^2 e- 392 $)—5 (275 Sas. (iv) 
S>O 


mp 


27 rl? 


The exponential part is —jtr(Z^!S) — st(Z;l$) = -lu[((Z-isx-3)ü + 
2s525>153)] and hence, 


m 


Lo(s) = I 42522 X;lX3|- 5. (6.4.4) 


The null case, X = 27, 
In this case, 51515? = I, so that 
Lo(s) = 11 +251[7? = 04257? > w 2. (6.4.5) 
Thus, the test criterion is the following: 
Reject Hy if w > x7, a» With Pr(x7, = Xap a} = ©. (6.4.6) 


When џ is known, it is used instead of its MLE to determine S,,, and the resulting criterion 
consists of rejecting Н, whenever the observed w, = tr(Z, $4) > Ха where n is the 
sample size. These results are summarized in the following theorem. 


Theorem 6.4.1. Let the null hypothesis be Н, : У = У, (given) and ш = tr(Z; |S) 
where S is the sample sum of products matrix. Then, the null distribution of w — (27! S) 
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has a real scalar chisquare distribution with (п— 1) p degrees of freedom when the estimate 


of ш, namely ù = X, is utilized to compute S; when u is specified, w has а chisquare 
distribution having np degrees of freedom where n is the sample size. 


The non-null density of w 


The non-null density of w is available from (6.4.4). Let A1, ..., A» be the eigenvalues 
of xix. Then L,,(s) in (6.4.4) can be re-expressed as follows: 


р 
Lw(s) = | [a 4225) 2. (6.4.7) 
j=l 


This is the Laplace transform of a variable of the form ш = Ашу +--+ + Аршр where 
ш1,..., шр are independently distributed real scalar chisquare random variables, each 
having т = n — 1 degrees of freedom, where А; > 0, j = 1,..., p. The distribution 
of linear combinations of chisquare random variables corresponds to the distribution of 
quadratic forms; the reader may refer to Mathai and Provost (1992) for explicit represen- 
tations of their density functions. 


Note 6.4.1. І the population mean value u is known, then one can proceed by making 
use of u instead of the sample mean to determine S,,, in which case n, the sample size, 
ought to be used instead of m = n — | in the above discussion. 


6.4.1. Arbitrary moments of à 
From (6.4.2), the h-th moment of the A-criterion for testing Н, : X = X, (given) ina 
real nonsingular №, (и, X) population, is obtained as follows: 
nph 


eT gzh пй EE 
ЖР 55112 15|2е 5110275 5) => 


nph 


n2 
nph 
2 = e - 
ER!) = ae 4 / К деш p уа as 
прһ n—l)p nh n— 
22 2 |5%|2|У| 2 DE) 5>0 
прћ nh , n—1 = 
е? 2Р(2 +2 p (A a L) 


_ 2 -1 edicion 
^" nph | (n—l)p nh |2; + һ5% | 3 


п—1 
пї 2 з |52152 TE) 
npn nh == 
2S ISI? Pp (B+ 551) 
= nh = 
|x|? 15051) 


ТЕ З eu (6.4.8) 
п 
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for RA + 11) > 2#, [E RXX.! > О. Under Н„: X = Xe, we have |I + 
=í 


hx [+ = (1+ hy PED, Thus, the /-th null moment is given by 


(6.4.9) 


nph nh | n—l 
ED" |H,] = е (2) LER a (1+ Ry PEt) 
i Г) 


for RÈ + 951) > Р. 


6.4.2. The asymptotic distribution of —2 In à when testing Ho : X = X, 


Let us determine the asymptotic distribution of —2 In A where A is the likelihood ratio 
statistic for testing Ho : X = Xo (specified) in a real nonsingular №, (и, X) population, as 
n — oo, n being the sample size. This distribution can be determined by expanding both 
real matrix-variate gamma functions appearing in (6.4.9) and applying Stirling's approx- 
imation formula as given in (6.5.14) by letting 5(1 + h) — oo in the numerator gamma 
functions and 5 — оо in ће denominator gamma functions. Then, we have 


Gt) 1 Pedhes к=) 


2 = 


nag 


р (Ол)? [^ E h)]0*9-75-5 e- 50-0) 


n l1 jJ af 


per (Ол)? [5]2-2-2 e 2 
= a + By POH BAA 
Hence, from (6.4.9) 
EDH] > +h) asn > oo, (6.4.10) 


pl). . . : 
where (1 + hye is the h-th moment of the distribution of e^? when y ~ X pipet 


Thus, under H,, —21n à > Хх? pti) aS п —> оо. For general procedures leading to asymp- 


totic normality, see Mathai (1982). 


Theorem 6.4.2. Letting à be the likelihood ratio statistic for testing Н, : У = Xo 
(given) on the covariance matrix of a real nonsingular № (ш, 27) distribution, the null 
distribution of —2 ln X is asymptotically (as then sample size tends to oo) that of a real 
scalar chisquare random variable having prp degrees of freedom, where n denotes the 
sample size. This number of degrees of freedom is also equal to the number of parameters 
restricted by the null hypothesis. 
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Note 6.4.2. Sugiura and Nagao (1968) have shown that the test based on the statistic A 
as specified in (6.4.2) is biased whereas it becomes unbiased upon replacing n, the sam- 
ple size, by the degrees of freedom n — 1 in (6.4.2). Accordingly, percentage points are 
then computed for —2 ln А, where A, is the statistic A given in (6.4.2) wherein n — 1 is 
substituted to n. Korin (1968), Davis (1971), and Nagarsenker and Pillai (1973) computed 
5% and 1% percentage points for this test statistic. Davis and Field (1971) evaluated the 
percentage points for p — 2(1)10 and n — 6(1)30(5)50, 60, 120 and Korin (1968), for 
p = 2(1)10. 


Example 6.4.1. Let us take the same 3-variate real Gaussian population №з (ии, 27), 
X > O and the same data as in Example 6.3.1, so that intermediate calculations could 
be utilized. The sample size is 5 and the sample values are the following: 


1 —1 —2 
ХІ = ‚ X2=| Оер 1|, X4=| 1|, еер 
—1 2 2 
the sample average апа the sample sum of products matrix being 
"PEE 1 1 270 —110 —170 
Х=-|2|,5=-— |—110 80 85 
5 5? 
4 —170 85 170 
Let us consider the hypothesis X = X, where 
2 0 0 5 0 0 
»=(|0 3 -1| => |54110; Со) 2 |0 4 2]; 
0-1 2 0 2 6 
d xo |7 99 
РД 10555 6 
5 0 0 270 —110 —170 
tr(E, 18) = — — tr 42||-10 80 85 
000) [о 2 6||-170 85 170 
1 
= ————[5(270 4(80) + 2(85 2(85) + 6(170 
(in 64 2 ) + (4(80) + 2(85)) + (2(85) + 6(170))] 


= 12.12; n= 5, pes 


Let us test the null hypothesis at the significance level о = 0.05. The distribution of the 
test statistic w and the tabulated critical value are as follows: 


-1 2 КЕ ДИ. 
ш = (XS) 7° Х(л—1ур ~ Xi: Xi2, 0.05 = 21.03. 
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As the observed value 12.12 < 21.03, H, is not rejected. The asymptotic distribution of 
—2]n A, as n > oo, is "m ny Xs where A is the likelihood ratio criterion statistic. 


Since хё 0.05 = 12.59 апа 12.59 > 12.12, we still do not reject Н, as п — oo. 


6.4.3. Tests on Wilks’ concept of generalized variance 


The concept of generalized variance was explained in Chap. 5. The sample general- 
ized variance is simply the determinant of S, the sample sum of products matrix. When the 
population is p-variate Gaussian, it has already been shown in Chap. 5 that S is Wishart 
distributed with m = n — 1 degrees of freedom, n being the sample size, and parameter 
matrix X > О, which is the population covariance matrix. When the population is mul- 
tivariate normal, several types of tests of hypotheses involve the sample generalized vari- 
ance. The first author has given the exact distributions of such tests, see Mathai (1972a,b) 
and Mathai and Rathie (1971). 


6.5. The Sphericity Test or Testing if H, : X = o?I, Given a N p(u, X) Sample 


When the covariance matrix X = o?I, where o? > 0 is a real scalar quantity, the 
ellipsoid (X — uy X CX — u) = c > 0, which represents a specific contour of constant 
density for a nonsingular №, (и, X) distribution, becomes the sphere defined by the equa- 
tion IQQ -u) = cor RIT — u)? + (хр = up?) = c > 0, whose center is 
located at the point u; hence the test's name, the sphericity test. Given a №, (и, X) sample 
of size n, the maximum of the likelihood function in the entire parameter space is 


np | np 


L n2e 2 
пирог = р, 

(Ол)? |S] 
as was previously established. However, under the null hypothesis Н, : © = o7I, 


tr(-!$) = (07)! (tr(S)) and | Z| = (o7)?. Thus, if we let Ө = o? and substitute a X 
in L, under Н, the loglikelihood function will be In Lo = —“ In) — “F 0 — aptr(S ). 

Differentiating this function with respect to 0 and equating the result to Pen М 9 е 
following estimator for Ө: 


В tr(S 
О 2и (6.5.1) 
пр 


Accordingly, ће maximum of the likelihood function under Н, is the following: 


max L = = Ол оту CS 
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Thus, the A-criterion for testing 
Aged. = o?1, o? >0 (unknown) 


is 


pera cele (6.5.2) 


In the complex Gaussian case when X у 09 N р(й, X), X = X* > О where an asterisk 
indicates the conjugate transpose, X; = Хуу + iX j2 where Xj; and Xj» are real p x 1 
vectors andi = 4/(—1). The covariance matrix associated with X ; is then defined as 


Cov(X j) = E((X; — ù (X; — i)*] 
= E[(Xj1 = hay) t i(Xjo — OX = way)’ — i(Xj2 — noy! 
= XyucdEX»cti(X-2Xp = X, with и = ua) ішо), 


where X is assumed to be Hermitian positive definite, with Z1; = Соу(Х у), 222 = 
Cov(X j2), X12 = Cov(X j1, Xj?) and X21 = Cov(X j2, Ху). Thus, the hypothesis of 
sphericity in the complex Gaussian case is X = o7/J where с is real and positive. Then, 
under the null hypothesis Н, : У = o?I, ће Hermitian form Y*ZY = с > 0 where c is 
real and positive, becomes o?Y*Y = c => |yi? +--+ Ip? = < > 0, which defines 
a sphere in the complex space, where |y;| denotes the absolute value or modulus of y;. If 
Jj = ууу + iyja with i = V(I), уу, ууз being real, then |¥;|? = УЛ + yiz 
The joint density of the sample values in the real Gaussian case is the following: 
noe 2Xj-0 E Qj-u) — g-$t(X 715) -nX -u X7 (X-u) 

L= p o] = mph 

(27) 2|2'|2 (2л) 2 |X|? 


j=l 


where X = 04 +++ XQ, Ху, Јј =1,...,n are iid №, (и, X), X > О. We have 
already derived the maximum of L in ће entire parameter space ©, which, in the real case, 
is Pp 

e 2n2 


= ip 6.5.3 
(Ол)? |S]? i. 


supo L = 


where S is the sample sum of products matrix. Under Ho, | X |2 = (02) 7 and t(X-1$) = 
EG +++++Spp) = 065). Thus, the maximum likelihood estimator of o? is a5 tr (S). 
Accordingly, the A-criterion is 


р? 15| 
[їг($)]Р' 


SIN 


= (6.5.4) 


tr(S)\ 2 
C m= 


^ = = 
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in the real case. Interestingly, (u1)'/? is the ratio of the geometric mean of the eigenvalues 
of S to their arithmetic mean. The structure remains the same in the complex domain, in 
which case det(S) is replaced by the absolute value |det(S)| so that 


. . p?|det(S)| 
uy = ————. 
[tr(S)]? 


For arbitrary h, the h-th moment of иј in the real case can be obtained by integrating out 
over the density of S, which, as explained in Sect. 5.5, 5.5a, is a Wishart density with 
n — 1 = m degrees of freedom and parameter matrix X > О. However, when the null 
hypothesis Н, holds, 27 = o?I р» So that the h-th moment in the real case is 


(6.5a.1) 


pr" 


m cd. 
E[u^|H,] = "a (tX 2) / 18132-27 2,2 G) (er(g)) Ph dS, (i) 
2Iy(F)(o*)2 Js>o 
In order to evaluate this integral, we replace [tr(S)] 7?” by an equivalent integral: 


1 
—ph _ 
[tr(S)] = Fon 


оо 
| Pledd, (А) > 0. (ii) 
x=0 


Then, substituting (ii) in (i), the exponent becomes – (1 + 2¢2x)(tr(S)). Now, letting 
ape zz + 2о?х)$ =» 15 = (202) 5 р(р+1) 


(14-2o0?x)- 2 dS}, and we have 


(202)Ph p^ oo 
Dy) Г(рћ) Jo 


m LEE _. 
x / [Si] Z+" Te (SDAS 
51>0 


ГЬ +h) p" 
Dy) D'(ph) 0 
c IR gi Еа 
BO” IUE pA) 
The corresponding h-th moment in the complex case is the following: 


E Г,т+Аћ Г 
Se АВЕ ро тт (6.5a.2) 
DyGn) Г (тр + ph) 
By making use of the multiplication formula for gamma functions, one can expand the real 
gamma function /"(mz) as follows: 


E[u^|H;] = xPtelo eden) POR. 


оо 
yP^-l(1 + у)-С+9рау, у = 202х 


R(A) > 0, т=п – 1. (6.5.5) 


1-т т—1 


1 1 
Г(т2) = (2л) 2 m™ 20 (z)r (z + mi Б. m=1,2,..., (6.5.6) 
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and for т = 2, we have the duplication formula 


І 
rejsen 32 ITE (e+ 5). (6.5.7) 
Then on applying (6.5.6), 
Рс ло TOT m 
" EL MC iii 
POP + phy ГО +) ГО Ehe) 


Moreover, it follows from the definition of the real matrix-variate gamma functions that 


DG +h) -n = +h) 
ст (iv) 
DOE E 
On canceling T (5 + h)/D (7) when multiplying (iii) by (iv), we are left with 
= + 4) Gat ї 
E[u} |H] -g "E G "moni (6.5.8) 
а Ге – 4) D ich 
The corresponding h-th moment in the complex case is the following: 
t- Dn о TUER 
ейн = { TIR П Еге. о m-n-l (65а3) 
za Fn DIAS Piin) 


For h = s — 1, one can treat Е[и*—! | Н] as the Mellin transform of the density of иј in 
the real case. Letting this density be denoted by f„, (u1), it can be expressed in terms of a 
G-function as follows: 


p-1,0 
fu Qa Ho) = 6G. 1,р—1 


mi1, j=1,...,p—1 

DE жо | osusi, (6.5.9) 
2-5-L 

and f,,(u1| Ho) = О elsewhere, where 


pal г(® + ie) 


а={П те m 
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the corresponding density in the complex case being the following: 


—1,0 m4 2-1, j=1,...,p—1 А 
Ўн.) = 165 Lp-1 "TE j-1, j=1,..,.p-1 |: 9 S lal < 1, (6.5a.4) 


and fü (41) = О elsewhere, where G is a real G-function whose parameters are different 
from those appearing in (6.5.9), and 


ПЕ an 


H I mJ) 


For computable series representation of a G-function with general parameters, the reader 
may refer to Mathai (1970a, 1993). Observe that u; in the real case is structurally a product 
of p — 1 mutually independently distributed real scalar type-1 beta random variables with 


the parameters (aj = 2 —4, В; = 5 + an j=1,..., p—1. In the complex case, iij is 
structurally a product of p — 1 mutually independently distributed real scalar type-1 beta 
random variables with the parameters (aj = m — j, Bj = j + z) j=1,..., p—1. This 


observation is stated as a result. 


Theorem 6.5.1. Consider the sphericity test statistic for testing the hypothesis Н, : X = 
o*I where o? > 0 is an unknown real scalar. Let u, and the corresponding complex 
quantity йү be as defined in (6.5.4) and (6.5a.1) respectively. Then, in the real case, иј is 
structurally a product of p — 1 E distributed real scalar type-1 beta random 
variables with the parameters (aj = 5 — 2: Bj = Í j++ =), j=1,..., p— 1, and, in the 
complex case, ii, is structurally a vu ofp—1 idee distributed real scalar 
type-1 beta random variables with the parameters (aj = m — j, Bj = j + zh J= 
1,..., p— l, where m =n — 1, n = the sample size. 


For certain special cases, one can represent (6.5.9) and (6.5a.4) in terms of known 
elementary functions. Some such cases are now being considered. 


Real case: p = 2 


In the real case, for p = 2 


ГОР 520. же; 
Elul|Ho]=—2~ 2 22 L Aa 
r-i r Peri) sah 
This means u is a real type-1 beta variable with the parameters (o = 5 — І, В = 1). The 


corresponding result in the complex case is that й is a real type-1 beta variable with the 
parameters (y = m — 1, В = 1). 
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Real case: p = 3 


In the real case 
rg -5rG-cT9r$-i-mrG-i-cxh 
r2-5r&£-pr-cric-mr--ich) 


E(u" |Ho] = 


so that иј is equivalent to the product of two independently distributed real type-1 beta 
random variables with the parameters (oj, 8) = (5 — i, 2+3), j =1,2. This density 
can be obtained by treating E [u |H] for h = s — 1 as the Mellin transform of the density 
of иу. The density is then available by taking the inverse Mellin transform. Thus, again 
denoting it by fu, (ит), we have 


1 с+іоо 1 
fu; Qu Ho) = ass | $3(s)ds, с> = 
ЛІ Jc—ioo 


2 
оо оо 
= | ) Ry +), 
v=0 v=0 


r2 --)5r-3 
r-r- 
r(2-i-1-sr(2-1-1-s) , 


C3 — 


$3(s) = 


where R, is the residue of the integrand $3(s) at the poles of Г (5 — 3 + s) and Rj, is the 
residue of the integrand фз (х) at the pole of Г(5 — 2 + s). Letting sı = > — 3 +5, 


2.3. PDT} +s )uy" 


Къ = lim фз(з)= lim [(,+v)u2 ?^——————————— 
б svt- iere | Tdelesordcies 
1 
ae c m ГС» -v) Т 
=u; 


" rdej-vordei-»" 

We can replace negative v in the arguments of the gamma functions with positive v by 

making use of the following formula: 

Nri 
(a +1) 


where for example, (b), is the Pochhammer symbol 


(b) =b(b+1)---(b+v-— 1), b z O0, (bb = 1, (6.5.11) 


I'(a — v) a#1,2,..., v =0,1,..., (6.5.10) 
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so that 


r(-i-v) DT [gie Sn 


Gv 3 2 o ' 
2 1 (—1)”Г(% + 4) 
r(2*5-v)- =a 1 
3+5) 


The sum of the residues then becomes 


> r(-i m3 1 1 2 13 
> Rs 1 : D нү RS oo чо шш) 0xujxl 
v=0 I'(5 + DIG + 5) 


It can be similarly shown that 


= Pe m 1 21 
NU 1 m uf ah(- 2-3 jun). О<и < 1. 
^ Г@+Г@ +1 Жї. 


Accordingly, the density of и for р = 3 is the following: 
Г Al) т_3 1 1 3 
лан) = e| ——2 uP n oz угш) 


= 3-55) ii). О=и=1 (6.5.12) 


and f,,(u1| H5) = О elsewhere. 


Real case: p — 4 


In this case, 


r(2—3-4sr(E&—-245r(2—3-5) 
PO pO ci ole ыз) 


E[w | H;] = ca 


where c4 is the normalizing constant. However, noting that 
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m 5 


there is one pole ats = —5 + 3. The poles of Г(5 — 5 + s) occur at s = —5 + 
3 —v, v = 0,1,..., and hence at v = 1, the pole coincides with the earlier pole and 
there is a pole of order 2 ats = —5 + 3. Each one of other poles of the integrand is 
simple, that is, of order 1. The second order pole will bring in a logarithmic function. 
As all the cases for which p > 4 will bring in poles of higher orders, they will not be 
herein discussed. The general expansion of a G-function of the type ст (.) is provided 
in Mathai (1970a, 1993). In the complex case, starting from p > 3, poles of higher orders 
are coming in, so that the densities can only be written in terms of logarithms, psi and zeta 
functions; hence, these will not be considered. Observe that tı corresponds to product of 
independently distributed real type-1 beta random variables, even though the densities are 
available only in terms of logarithms, psi and zeta functions for p > 3. The null and non- 
null densities of the A-criterion in the general case, were derived by the first author and 
some results obtained under the null distribution can also be found in Mathai and Saxena 
(1973). Several researchers have contributed to various aspects of the sphericity and multi- 
sample sphericity tests; for some of the first author’s contributions, the reader may refer to 
Mathai and Rathie (1970) and Mathai (1977, 1984, 1986). 


Gamma products such as those appearing in (6.5.8) and (6.5a.3) are frequently en- 
countered when considering various types of tests on the parameters of a real or complex 
Gaussian or certain other types of distributions. Structural representations in the form of 
product of independently distributed real scalar type-1 beta random variables occur in nu- 
merous situations. Thus, a general asymptotic result on the h-th moment of such products 
of type-1 beta random variables will be derived. This is now stated as a result. 


Theorem 6.5.2. Let и be a real scalar random variable whose h-th moment is of the 
form 
moja Р ТЕ" КУ Гра +y +ô) 
i Гуа +у) Га +оаһ+у +ô) 


where Г. (:) is a real matrix-variate gamma function on p х p real positive definite ma- 
trices, a is real, y is bounded, 8 is real, О < 8 < oo and h is arbitrary. Then, as 
а — oo, —2lnu > XZ р» a real chisquare random variable having 2 p ô degrees of 
freedom, that is, a real gamma random variable with the parameters (a = р ô, В = 2). 


(6.5.13) 


Proof: On expanding the real matrix-variate gamma functions, we have the following: 


буу T (AC Баат 
Dy(a + у) 


(0 


j=l 
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D, (o(14- h) + y) -n rad +h) +y = 151) 
Dy(a(1--h)- y) 55 | Г@а+®+у+8—#51) 
Consider the following form of Stirling's asymptotic approximation formula for gamma 
functions, namely, 


(i) 


(z+) = Vn 211—267 for |2] — oo and у bounded. (6.5.14) 


On applying this asymptotic formula to the gamma functions appearing in (i) and (ii) for 
a — oo, we have 
р NU 
[| ume | 2 ) ард 
p Fey) 


and Р - 
" ide 
П с. — [«(1 + hy] ??, (iii) 
jel Г(а(1+ а) +y +6 = 452) 
so that 
E[u* | > 4h)". (iv) 


On noting that E[u^] = E[e*™*] > (1+h)~?°, itis seen that In u has the mef (1 jij?" 
for 1 +h > 0 or —21nu has mgf (1 — 2h)~?° for 1 — 2h > 0, which happens to be the 
mgf of a real scalar chisquare variable with 2 p ô degrees of freedom if 2 p ô is a positive 
integer or a real gamma variable with the parameters (a = рд, B = 2). Hence the 
following result. 


Corollary 6.5.1. Consider a slightly more general case than that considered in Theo- 
rem 6.5.2. Let the h-th moment of u be of the form 


P D(a(1 4h) + у) т Г(« + yj + 5;) 
hy __ J J J 
Еш] = | П сз H П о oa) (6.5.15) 


Then as a — оо, E[u^] > (1-4) - Gut +8p), which implies that —21n u — X34 3,) 
whenever 2(6, +--+ + dp) is a positive integer or, equivalently, —2 lnu tends to a real 


gamma variable with the parameters (a = 6, + --- +, B = 2). 


Let us examine the asymptotic distribution of the test statistic for the sphericity test in 
the light of Theorem 6.5.2. It is seen from (6.5.4) that A” = 1^2. Thus, by replacing h by 
5h in (6.5.8) with m — n — 1, we have 


p-l DG 5) р-1 red+h= 1 jy 
E qh H, = EM MN M 2 2 2 | de 
[А | Ho] m pl тюз! ( ) 
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2 

221 (+i) 
variable having 257008 qoe рш) + (р- 1) = (p- Dien degrees of freedom. 
Hence the following result: 


Then, it follows from Corollary 6.5.1 that -21n à — x , a chi-square random 


Theorem 6.5.3. Consider the i-criterion for testing the hypothesis of sphericity. Then, 
under the null hypothesis, —21nA — XC ese as the sample size n — oo. In the 
2 


complex case, as n — оо, —2Ink > Ж ocio a real scalar chisquare variable with 
2120-0 + 020] = р(р- 1) +(р- 1) = (p — 1) (р + 1) degrees of freedom. 


Note 6.5.1. Ме observe that the degrees of freedom of the real chisquare variable in the 
real scalar case is Dips) which is also equal to the number of parameters restricted 
by the null hypothesis. Indeed, when X = o?1, we have о; j = 0, i Æ j, which produces 


р(р—1) 2 
2 


restrictions and, since o is unknown, requiring that the diagonal elements are such 


that 01; = --- = Opp produces р— 1 additional restrictions for a total of we Dip) restric- 
tions being imposed. Thus, the degrees of freedom of the asymptotic chisquare variable 
corresponds to the number of restrictions imposed by Ho, which, actually, is a general 


result. 
6.6. Testing the Hypothesis that the Covariance Matrix is Diagonal 


Consider the null hypothesis that 27, the nonsingular covariance matrix of a p-variate 
real normal distribution, is diagonal, that is, 


Ho : X = diag(oii, ..., Opp). 


Since the population is assumed to be normally distributed, this implies that the compo- 
nents of the p-variate Gaussian vector are mutually independently distributed as univariate 
normal random variables whose respective variances аге о;у, j = 1,..., p. Consider a 
simple random sample of size n from a nonsingular № (и, X) population or, equivalently, 
let X1, ..., X, be independently distributed as №, (и, X) vectors, X > 0. Under Ho, oj; 
is estimated by its MLE which is буу = 15 jj Where sj; is the j-th diagonal element of 
S = (sij), the sample sum of products matrix. The maximum of the likelihood function 
under the null hypothesis is then 
Р 1 пр 1 


тах L = max Lj = ae пе 30D. 


NIS 


the likelihood function being the joint density evaluated at an observed value of the sample. 
Observe that the overall maximum or the maximum in the entire parameter space remains 
the same as that given in (6.1.1). Thus, the A-criterion is given by 
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sup, L E S 
_ sup,L _ |S| QM = "T (6.6.1) 
supoL ae [j= Si 


where S ~ W,(m, X), X > О, апай т = n — 1, n being the sample size. Under Ho, 
У = diag(o11,..., Opp). Then for an arbitrary h, the h-th moment of u» is available by 


taking the expected value of An with respect to the density of S, that is, 


mip Pl 1105-15) түр —һ 
[Sjzt eTa П, 547) 


Euh = | pO m 
$-0 2T|x| Г(") 


dS (6.6.2) 


where, under Ho, |X| = 011 ++ Opp. As was done in Sect. 6.1.1, we may replace o by 
the equivalent integral, | 


1 Pe 
—h h—l4-xj(s;;) . 
Sj = rol xj e 7" аху, (Л) > 0. 


Thus, 


P 
Ho aas f of entes A. Adtp Ò 
IUS “гр So 


where Y = diag(x1,..., xp), so that tr(Y S) = x1s11 +--+ + xpspp. Then, (6.6.2) can be 
reexpressed as follow: 


оо oo „Л—1 h—1 
h edio Л: cx js te 
np: m р np 
22 DG I Ojj) 2 5> 0 


_MnGt+a fo ‘i e 
= mp mp 
PG) 2TQqI ao; 
[E a 
x ——— 
2 


e 20 (27 H2Y)S)qs 


E[u^|H,] = 


аху, ^... ^ dxp, 
and observing that, under H,, 


d г! 
a = =—| II -- 2X Y| with 
2 2 
|J 4-2ZY| = (1 -2o11y1) +-+ (1 + 205p yp), 
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MC 20 E 
Еш Н] = — 2 af Ta+y,) (2+0 ду, уран 
^ DG) ; Hz; 7 J? o4 ЈО jj 
PC PRE ГС) p т p-1 
Г ту И ЕЮ. RN E (6.6.3) 
Thus, 
Elut| Ho] [o fg е 
из|Но] = | ет ух 
го j=l Г(5 A 
Lee ЕЕ m ith) 
TEI rE- i) dra 


Denoting the density of u2 as fu, (u2|Ho), we can express it as an inverse Mellin transform 
by taking h = s — 1. Then, 


—1,0 
fu; Q2] Ho) = c2, p— 1 G5- 1,p— IE 


and zero elsewhere, where 


[n (je! 
quard&- 


Some special cases of this density are expounded below. 


C2,p-1 = 


Real and complex cases: p = 2 


When р = 2, из has areal type-1 beta density with the parameters (о = 7 — І, В = 1) 
in the real case. In ће complex case, it has a real type-1 beta density with the parameters 
(a = т – 1, В = 1). 


Real and complex cases: р = 3 
In this case, fu, (u2| Ho) is given by 


eH Г(т 3 5) г(2 2+5) . 


fu (u2| Ho) = c2 72]. Tre т ӘР из ds. 
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The poles of the integrand are simple. Those coming from "(> — 3 + s) occur at s = 
-2 +3 — v, v=0,1,....The residue R, is the following: 


m 34 
par „зы ЕО Да И ВЫЕ 
и 76-02 FOP Gh v! 
Summing the residues, we have 


E bey a.u 13 
Ry 2 — u? (5, 5353 бе < 1. 
2 оор ОЕ Е 


Now, consider the sum of the residues at the poles of (4 — 2 + s). Observing that 
Г (5 — 2 + s) cancels out one of the gamma functions in the denominator, namely Г ( — 
1+5) = ($—2—5s)P(5 —2Z- s), the integrand becomes 
Г("® — 3 + 5)иу" 
(2-2 +) Г(9 1+5) 


14,22 
ru; 


the residue at the pole s = — 5 + 2 being Гу: Then, noting that Г (-3 = 
-2r() = —2,/r, the density is the following: 
fu G2] Ho) = суз] rud : Pin (5. 55 J} Oswsl, (665) 
uy (U2| H5) = c и — ——u ~,x~3~342)7, 0< u <1, 6. 
22 2,2 2 Ja 27105, 53 5? 02 2 
and zero elsewhere. 
In the complex case, the integrand is 
lim —2-+5)Г(т—3-+Е5) _, 1 zi 
=== 2 и = 2 ms 
ШГ (m — 1+ s)] (m — 2 + s)*(m — 3 + s) 
and hence there is a pole of order 1 at s = —m + 3 and a pole of order 2 at s = —m + 2. 
т—3 
The residue at s = —m + 3 is QUT = а and the residue at s = —m + 2 is given by 
д 1 | д zu 
Üm —(m -2+s)| | E | "2 | 
s>—m+2 05 (m — 2 + s)?(m —3 + 5) s——m-42 0s (m — 3 + 5) 


which gives the residue as un Inu» — T E Thus, the sum of the residues is ques + 


7 Inu» — uU and the constant part is 


[r (m)? 


Р И 
тоо) ОА 
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so that the density is 


fu (2) = (m — 1 (m — 2147? ug? Inu —u7 ^] 0 из <1, m > 3, 
т—1 
and zero elsewhere. Note that as u2 — 0, the limit of 2 In u2 is zero. Ву integrating out 
over 0 < из < 1 while m > 3, it can be verified that fu, (-) is indeed a density function. 


Real and complex cases: p > 4 


As poles of higher orders are present when p > 4, both in the real and complex cases, 
the exact density function of the test statistic will not be herein explicitly given for those 
cases. Actually, the resulting densities would involve G-functions for which general ex- 
pansions are for instance provided in Mathai (1993). The exact null and non-null densities 


of и = Ал have been previously derived by the first author. Percentage points accurate to 
the 11th decimal place are available from Mathai and Katiyar (1979a, 1980) for the null 
case; as well, various aspects of the distribution of the test statistic are discussed in Mathai 
and Rathie (1971) and Mathai (1973, 1984, 1985) 


Let us now consider the asymptotic distribution of the A-criterion under the null hy- 
pothesis, 
Не: X = diag(011, ..., Opp). 
Given the representation of the h-th moment of u» provided in (6.6.3) and referring to 
Corollary 6.5.1, itis seen that the sum of the 8;'s is P» 8; = 3 l= ис so that 


j=l 2 7 
the number of degrees of freedom of the asymptotic chisquare distribution is 21200] = 


por) which, as it should be, is the number of restrictions imposed by Ho, noting that 


when X is diagonal, o;; = 0, i # j, which produces Р P-D restrictions. Hence, the 


following result: 


Theorem 6.6.1. Let à be the likelihood ratio criterion for testing the hypothesis that 

the covariance matrix X of a nonsingular Np(u, 27) distribution is diagonal. Then, as 

n — œ, —2Ina > Xaa in the real case. In the corresponding complex case, as 
PD 


n — oo, -21nA > Хор а real scalar chisquare variable having р(р — 1) degrees of 
freedom. 


6.7. Equality of Diagonal Elements, Given that X is Diagonal, Real Case 


In the case of a p-variate real nonsingular N,(u, X) population, whenever X is 
diagonal, the individual components are independently distributed as univariate nor- 
mal random variables. Consider a simple random sample of size n, that is, a set of 
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p х 1 vectors Xj,..., Xn, that are iid as X; ~ Np(u, X) where it is assumed that 
y= diag(oy,...,05). Letting Хх, = (X1j,---,Xpj), the joint density of the x,;'s, 
ј =1,...,n, in the above sample, which is denoted by L,, is given by 
n cR" de "1 %rj—Mr)? 
L, = 1 1 == п п 
jai Ол)2 (02)2 (27)? (02)? 


Then, on substituting the maximum likelihood estimators of jz, and о? in L,, its maximum 
is 


Е, п 
n?e 2 2 
тах L, = — n. mno рр — У `0 E s 
(2л) (sr)? = 
Under the null hypothesis Ho, Os = ... = о} = o? and the MLE of o? is a pooled 


estimate which is equal to яр ($11-Е-++-Е$рр). Thus, the A-criterion is the following in this 
case: А 
sup, — [s11522:: Spp]? 


А = = s 6.7.1 
supo (ew)? 
If we let 5 
PPC lG sj) 
T = РР 8) (6.7.2) 


(84)? 


then, for arbitrary h, the h-th moment of из is the following: 


ph р һ 
РР asp 
(s11 To spp)Ph 


Ep] Ho] = E| |= |" П sh)Gu sg" |. (673) 


j=l 
Observe that “if =; oe = Ж т = n — 1, for j = 1,..., р, the density of sj; being of 
the form 
Iu mnc) шла т dest (i) 


under Ho. Note that (511 +---+5 a can be replaced by an equivalent integral as 


1 
l'(ph) 


оо 
(s11 t+: + $pp) Т" E | хР—1е—Хби+=+5рә)д у (А) > 0. (ii) 
0 
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Due to independence of the s;;'s, the joint density of 511, ..., Spp, is the product of the 
densities appearing in (i), and on integrating out s11, ..., Spp, we end up with the follow- 
ing: 


1 = | П rG el П Gea _ TG +0) É a 


(202) 2 pir Do 20? [Г(5)]Р CD a P5 
Now, the integral over x can be evaluated as follows: 
(207)Ph i ph-1 2 ртр Г(Ф) 
x 1 +20°x) ede = ee RhA) > 0 
(ph) Jo | j DICE + ph) i 


Thus, 
rece +h Г 
E[u^|H,] = p?" о - жу ‚ R(h) > 0. (6.7.4) 
ree) ГС + ph) 
The density of из can be written in terms of an H-function. Since р is a positive integer, 
we can expand one gamma ratio using Gauss’ multiplication formula: 


1—р т 
rep Gyr ролен D eror Р 


mp 


ry + ph) — Qu) tp ber pn + һ).. ra + 22 + h) 


for p = 1,2,..., m > p. Accordingly, 


['(5 449p ?— rer) 


ED Heec mem. 
Ls Ho] = tre П re+t+h) 
гет + io га +2 [r (Œ + һ)у]Р—! 
р—1 П т = ЄЗ,р—1 р—1 2 j , 
rG) (Verte) Пў1Г@+4+в) 
(6.7.5) 
П; res 5) 
С3,р-1 = Cone vus nF + h) > 0. (6.7.6) 
2 


Hence, for h = s — 1, (6.7.5) is the Mellin transform of the density of из. Thus, denoting 
the density by fj, (из), we have 


1,0 n 14-2, J= зу; p-1 
fus 3| Ho) = €3, p— iG. l.p 1 1 а 1 | 0 x u3 < 1, (6.7.7) 


363g 


and zero elsewhere. 
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In the complex case, the h-th moment is the following: 


: zi 
ОАЕ CP (674.1) 


П Pan ia) 
Пс Pom + 4) 
čs p-1 = —— = | 6.7a.2 
€3,p-1 [Г (т)у]р—! (6.7a.2) 


and the corresponding density is given by 


—1,0 m 14-4, Ј=1,....р-1 
fas (@ЗІН,) = 63, p— iG 22 1,p— is 1,...,m— 1 2 


and zero elsewhere, G denoting a real G-function. 


Real and complex cases: p = 2 


It is seen from (6.7.5) that for р = 2, из is a real type-1 beta with the parameters (a = 
J: В = L) in the real case. Whenever p > 3, poles of order 2 or more are occurring, and 
the resulting density functions which are expressible in terms generalized hypergeometric 
functions, will not be explicitly provided. For a general series expansion of the G-function, 
the reader may refer to Mathai (1970a, 1993). 


In the complex case, when p = 2, из has a real type-1 beta density with the parameters 
(y = m, В = +). In this instance as well, poles of higher orders will be present when 
p = 3, and hence explicit forms of the corresponding densities will not be herein provided. 
The exact null and non-null distributions of the test statistic are derived for the general 
case in Mathai and Saxena (1973), and highly accurate percentage points are provided in 
Mathai (1979a,b). 


An asymptotic result can also be obtained as n — oo . Consider the h-th moment of 
A, which is available from (6.7.5) in the real case and from (6.7a.1) in the complex case. 
Then, referring to Corollary 6.5.2, 5; = » whether in the real or in the complex situations. 


Hence, Шу] е2 уза | = = (p — 1) in both the real and the complex cases. As 


well, observe that in the complex case, the diagonal elements are real since X is Hermitian 
positive definite. Accordingly, the number of restrictions imposed by Н, in either the real 
or complex cases is p — 1. Thus, the following result: 


Theorem 6.7.1. Consider the A-criterion for testing the equality of the diagonal ele- 
ments, given that the covariance matrix is already diagonal. Then, as n — oo, the null 
distribution of —21n à > Ха in both the real and the complex cases. 
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6.8. Hypothesis that the Covariance Matrix is Block Diagonal, Real Case 


We will discuss a generalization of the problem examined in Sect. 6.6, consid- 
ering again the case of real Gaussian vectors. Let Xj,..., X, be iid as X; ~ 
Np(u, X), X > О, and 


nj Хар xij T 
X=] i f=] : [Xap-2| : |, Xen= cies 
Х^р] Хок) Xpij Xpit pij 
Ži O e O 
О 55 0 | | 
= a | ‚ Ajj being pj x pj, j = 1,...,К. 
O о «+ ц 


In this case, the p x 1 real Gaussian vector is subdivided into subvectors of orders 
Di s рк, SO that pı +---+ рк = p, and, under the null hypothesis Ho, X is assumed to 
be a block diagonal matrix, which means that the subvectors are mutually independently 
distributed p;-variate real Gaussian vectors with corresponding mean value vector (у) 
and covariance matrix Z;j, j = 1,...,k. Then, the joint density of the sample values 
under the null hypothesis can be written as L = n 1 Lr where L, is the joint density 
of the sample values corresponding to the subvector Хүр), j =1,...,n, r=1,...,k. 
Letting the p x n general sample matrix be X = (Xj,..., Xn), we note that the sam- 
ple representing the first pı rows of X corresponds to the sample from the first subvector 
Xaj e Np); %11), 2 > О, Ј = 1, e, H. The MLE's of Hr) and X are the 
corresponding sample mean and sample covariance matrix. Thus, the maximum of L, is 
available as 


e n + E еп? 
тах 1, = == әз >= m = [mar r пр k 
(2л) 7-|$,,|2 rel © Ол)? L= 1%]? 
Непсе, 
sup, L S|? 
йш AE E sn (6.8.1) 
зире. 4118.1 
and $ 
e E (6.8.2) 
k 
= DA 


Observe that the covariance matrix X = (о;у) can be written in terms of the matrix of 
population correlations. If we let D = diag(o1,..., оу) where o? = оң denotes the 
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variance associated the component x;; in X, = (xij,...,xpj) where Cov(X) = X, and 
К = (prs) be the population correlation matrix, where prs is the population correlation 
between the components х,у and x;;, then, X = DRD. Consider a partitioning of X into 
k x k blocks as well as the corresponding partitioning of D and R: 


Eu Xo occ Lik D O + О Bu. dix RE 
En X» ce Xx о D- O Ros КЁ -© Rọ 
Уа XQ с NX о O +++ Dk Ru Кю se Ru 


where, for example, 27;; is pj X pj, pi +: + px = р, and the corresponding par- 
titioning of D and R. Consider a corresponding partitioning of the sample sum of prod- 
ucts matrix 5 = (Sij), Рб) and R(? where RO is the sample correlation matrix and 
DG = diag( sir, ..., Spp), where Sjj, D RY are pj X Pj, pit: pk = p. 
Then, 

XI IR 


a сш _, (6.8.3) 
IDE ID 


and m 
S R M 
S е M HN (6.8.4) 


Mils Tar RO] 
An additional interesting property is now pointed out. Consider a linear function of the 
original p x 1 vector X; ^ Np(u, X), X > О, in the form CX; where C is the diagonal 
matrix, diag(c1, ..., Cp). In this case, the product C X ; is such that the r-th component of 
Xj is weighted or multiplied by c,. Let C be a block diagonal matrix that is partitioned 
similarly to D so that its j-th diagonal block matrix be the p; x pj; diagonal submatrix 
Cj. Then, 


CSC' S 
uc = | | = ISI = ил. (6.8.5) 


k k = 
jars SyCGl j Syl 


In other words, u4 is invariant under linear transformations on Х; a №(и, X), X > 
О, j =1,...,n. That is, if Y; = CX; + d where d is a constant column vector, then the 
р x n sample matrix on Y;, namely, Y = (Yi, ..., Yn) = (CX, +d, ..., CXn + d), 


Ү-Ү=С(Х-Х)= 5, = (Ү-Ү)(Ү-Ү) = С(Х-Х)(Х—-Х)'С' = CSC. 
Letting S, be partitioned as S into k x k blocks and S, = (Sjjy), we have 
DI ICSC'| [S| 


: =— -— Se (6.8.6) 
Mja  IDzC;S5jCj D= ISl 


Uy 


Hypothesis Testing and Null Distributions 455 


Arbitrary moments of ид can be derived by proceeding as in Sect. 6.6. The A-th null mo- 
ment, that is, the ^-th moment under the null hypothesis Ho, is then 


k 

1 mip PL _1 -1 - Н 

E(u!|Ho] = 3] КТШ 2 е 511027, э TIS "ds (i) 
2? ГУ(5)|2%|? /s»0 nr 


where m = n — 1, n being the sample size, and 


Xu О 3 О 

О X» E E -1 " 

= |. Я ‚ 00027 S) = (Xi Su) +--+ + tL Su) (0) 
| о O «= A 


where S,» is the r-th diagonal block of S, corresponding to X,, of X whose order p, х 
Pr, т =1,...,k, pp +--+ + рк = p. On noting that 


рг+1 
h— 2 


Na ІУ, О s eu. ii) 


Г) Jy,>0 


where Y, > О is a p, x p, real positive definite matrix, and replacing each |S,,. |" by its 
integral representation as given in (iii), the exponent of e in (7) becomes 


Y O-.. 0 
1 E О 1 
5105518) 2QYS)LY- |. o. | tr(Y S) = (Y1 S11)+ (У Su). 
о о... Үү 


The right-hand side of equation (i) then becomes 


D, +h) E E 
Ен] = 22 [TI —— ус” | 
Ilp) Yol? =j p. ) Y,>0 
-1 — (2h) А m p-l @ 
х |25 +2Ү| 27а л... лаў, RUD) ta : (iv) 


It should be pointed out that the non-null moments of u4 can be obtained by substituting a 
general X to Xo in (iv). Note that if we replace 2Y by Y, the factor containing 2, namely 
2Ph will disappear. Further, under Ho, 


k k 
551289 = х," T Tu e2z«rirc 8*9] о 


r=1 DESI 


456 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Then, each Y,-integral can be evaluated as follows: 


rcl m 
T, h) yes ао ar, 
Pr Ү.>0 


= 2s | Xr |^ 


r+l m I 1 
VA ees |Z + Z, © +%аў,„, Zr = 25Ү, E. 


Гр, (A) 
һ Гу, (Л) D») 


EXP al —————=—— (vi) 
T, (i) Fp, + Лу 
On combining equations (i) to (vi), we have 
DG +) үү Ty) р—1 
E[u} H] = “4. "C dye = (6.8.7) 
4141o Г,(9) “ сеят; 2 
1 
PI +h) PG - 4 +h) 
= c4, рт RES = = C4 pc Р a ‚ (6.8.8) 
[= 1D» C +A) Ton", DG — + +h) 
г(рг—1) А т i- 
a Iia fs» 1 Гь.) _ Шел ш ож ТЗ баса! (6.8.9) 
р = 5(р—1) m TI , Ды; 
Гб) m^ up D =) 
ne 
C= 
k pr(Gr-—1) 
IL л 


so that whenh = 0, E [ий | Ho] = |. Observe that one set of gamma products сап be can- 
celed in (6.8.8) and (6.8.9). When that set is the product of the first p; gamma functions, 
the h-th moment of u4 is given by 


occum 
E(u! | H,] = СА,р—р\ Ir : "ni reo Fa m (6.8.10) 
—2 "з 75- 


where сд, р ру is such that E Ги? | Н] = 1 when Л = О. Since the structure of ће expression 
given in (6.8.10) is that of the h-th moment of a product of p— pı independently distributed 
real scalar type-1 beta random variables, it can be inferred that the distribution of u4| Н, is 
also that of a product of p — pı independently distributed real scalar type-1 beta random 
variables whose parameters can be determined from the arguments of the gamma functions 
appearing in (6.8.10). 


Some of the gamma functions appearing in (6.8.10) will cancel out for certain values 
of pı, ..., рк,, thereby simplifying the representation of the moments and enabling опе to 
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express the density of u4 in terms of elementary functions in such instances. The exact null 
density in the general case was derived by the first author. For interesting representations of 
the exact density, the reader is referred to Mathai and Rathie (1971) and Mathai and Saxena 
(1973), some exact percentage points of the null distribution being included in Mathai 
and Katiyar (1979a). As it turns out, explicit forms are available in terms of elementary 
functions for the following special cases, see also Anderson (2003): py = p2 = рз = 
l; pi = р = рз = 2; р = р =l, p= р- 25 р = 1, р = рз = 2; р = 
l, р = 2, рз = 3; р = 2, р = 2, рз = 4; р = р = 2, рз = 3; р = 2, р = 3, р 
is even. 


6.8.1. Special case: k = 2 


Let us consider a certain 2 x 2 partitioning of S, which corresponds to the special case 
k = 2. When ру = land p = p — 1 so that ру + рә = p, the test statistic is 


a 151 [$11 — $1255; S21 
4 — — 
IS11l 1522] [S11] 
s11 — 81255) S21 2 
= ——————————— =] 6.8.11 
ге T1... p) ( ) 
where r (2... p) is the multiple correlation between х] and (x2, ..., xp). As stated in Theo- 


rem 5.6.3, 1 =r? (2...p) is distributed as a real scalar type-1 beta variable with the parameters 


(nU – Р = ‚2 = ). The simplifications in (6.8.11) are achieved by making use of the prop- 


erties of determinants of partitioned matrices, which are discussed in Sect. 1.3. Since s11 
is 1 x 1 in this case, the numerator determinant is a real scalar quantity. Thus, this yields a 
type-2 beta distribution for w — mom and thereby ES w has an F-distribution, so that the 
test can be based on an F statistic having (n — 1) — (p — 1) = п — p and p — 1 degrees 
of freedom. 


6.8.2. General case: k = 2 


If in a2 x 2 partitioning of S, Sy; is of order pı x pı and 522 is of order p» x p» with 
p2 = p — pi. Then ид can be expressed as 


и [S] ISi — $1255 S21 
4 ЕЕ —— 
[511] [522] INIT 
al - at —1 
= |I — S P SoS Sas, Fl = |1 — Ul, U = 8125128515181 (6.8.12) 


where U is called the multiple correlation matrix. It will be shown that U has a real matrix- 
variate type-1 beta distribution when S11 is of general order rather than being a scalar. 
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Theorem 6.8.1. Consider ид for k = 2. Let 511 be рі x рі and S22 be рә x po so that 
Pi + p2 = p. Without any loss of generality, let us assume that р < po. Then, under 
Н, : Xi» = О, the multiple correlation matrix U has a real matrix-variate type-1 beta 
distribution with the parameters (92, $e 2), with m — n — 1, n being sample size, and 
thereby (I — U) ~ type-I beta (5 — 5, 52), the determinant of I — U being ид under 


the null hypothesis when k — 2. 


Proof: Since X under H, can readily be eliminated from a structure such as u4, we will 
take a Wishart matrix S having m = n — 1 degrees of freedom, n denoting the sample size, 
and parameter matrix /, the identity matrix. At first, assume that 27 is a block diagonal 
matrix and make the transformation Sı = X -28 X =, As a result, u4 will be free of X11 
апа 2722, and so, we may take 5 ~ W, (m, I). Now, consider the submatrices 511, S22, 512 
so that dS = 151 ^ 4522 ^ 1512. Let f (S) denote the W, (m, I) density. Then, 


m pti 
181277 ias) _ |51 Siz E 
fou S 55 аш зы Pre 


However, appealing to a result stated in Sect. 1.3, we have 
ISI = 18221 151 — 512555 S211 
E zl 
= |$»2| ISl Z — $1, S12 S55! $218, |. 


The joint density of S11, S22, S12 denoted by f1(S11, S22, S12) is then 


m. p+! 


m_ p+! m. ptl m ptl 
Л(511, S22, Si2) = |81112 2 |15212 2 |I —U|? 7? 


е 3u(51)—-3t(522) 


=l 
2 


=l 
U= Si УРУ И 


27 (3) 


c —1 
Letting Y = S uU S 12555, it follows from a result on Jacobian of matrix transformation, 


previously established in Chap. 1, that dY = ISul | Sp 4512. Thus, the joint density 
of S11, S22, Y, denoted by f2(S11, S22, Y), is given by 
falSt1, S22, Y) = [Su 2+? E sz ии 


е 2110911) 3010522) 


рт 


22 DG) 
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Note that $11, S22, Y are independently distributed as f>(-) can be factorized into functions 
of S11, S22, Y. Now, letting U = YY’, it follows from Theorem 4.2.3 that 


me i 

л p pit 

dy = 1012-2 dU, 
Pi Cz) 


and the density of U, denoted by f3(U), can then be expressed as follows: 


P2 р1+1 
2 


po pj*l m piti 
BU) = с|0|2 2 |I — U|? a UU M, (6.8.13) 


which is a real matrix-variate type-1 beta density with the parameters (22, 5 — 4), where 
c is the normalizing constant. As a result, / — U has a real matrix-variate type-1 beta 
distribution with the parameters (2 — 2 P2), Finally, observe that u4 is the determinant 


2 2? 
of ] — U. 


Corollary 6.8.1. Consider ид as given in (6.8.12) and the determinant |] — О | where U 
and I — U are defined in Theorem 6.8.1. Then for k = 2 and an arbitrary h, E(u! | Н] = 
I- U}. 


Proof: On letting k = 2 in (6.8.8), we obtain the h-th moment of u4|H, as 


ai ee 0) 


Lrg - 5 + АНП rE – 5 +m) 


I . 
Е[и4|Н,] = C4,p (0) 
After canceling p» of the gamma functions, the remaining gamma product in the numerator 
of (i) is 


p2 p+! pod p2 m 
iee ee re (а у= 


. р1(р1-1) - : ; ; 
excluding л 4, The remainder of the gamma product present in the denominator is 

. А " . pi(p1-D 
comprised of the gamma functions coming from lp, (3 + h), excluding л 3? —. The 


normalizing constant will automatically take care of the factors containing л. Now, the 
resulting part containing h is Гь, (4 — 5 --h)/ Г, G5 +h), which is the gamma ratio in the 
h-th moment of a ру x p; real matrix-variate type-1 beta distribution with the parameters 
Go). 

Since this happens to be E[|J — U|]" for I — U distributed as is specified in Theo- 
rem 6.8.1, the Corollary is established. 


An asymptotic result can be established from Corollary 6.5.1 and the A-criterion for 
testing block-diagonality or equivalently the independence of subvectors in a p-variate 
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Gaussian population. The resulting chisquare variable will have 2 У j $) degrees of free- 
dom where ó; is as defined in Corollary 6.5.1 for the second parameter of the real scalar 
type-1 beta distribution. Referring to (6.8.10), we have 


ДЕ P j-l рур = 1) 
Bue cx p» ee уй 4 
J j=pitl j=2 i=l j=pitl j=2 
ut -yae ipi - D _ р(р—1) 0 |y elm) Pi) 
— = 4 


j=l 
Accordingly, the degrees of freedom of the resulting chisquare is Ж РРР} = 
Уы BEP HD in the real case. It can also be observed that the number of restrictions 
imposed by the null hypothesis H, is obtained by first letting all the off-diagonal elements 
of X = X'equalto zero and subtracting the off-diagonal elements of the k diagonal blocks 
which produces 200-7) - у Ша = j Bu In the complex case, the 
number of degrees of freedom will be twice that Бай for the real case, the chisquare 
variable remaining a real scalar chisquare random variable. This is now stated as a theorem. 


Theorem 6.8.2. Consider the A-criterion given in (6.8.1) in the real case and let the 

corresponding А in the complex case be à. Then —21n > x " as n — co where n is the 
k — pj(P—Pj) 

j=1 2 К 

Ho. Analogously, in the complex case, 210 à — X; as n — oo, where the chisquare 


sample size and ô = У , Which is also the number of restrictions imposed by 


variable remains a real scalar chisquare random variable, ô = x 8 Pj(p — pj) апап 
denotes the sample size. 


6.9. Hypothesis that the Mean Value and Covariance Matrix are Given 


Consider a real p-variate Gaussian population X; ~ Np(u, X), X > О, anda sim- 
ple random sample, X1, ..., Xn, from this population, the X;'s being iid as X ;. Let the 
sample mean and the sample sum of products matrix be denoted by X and S, respectively. 
Consider the hypothesis Н, : и = Ho, X = X, where и, and X, are specified. Let us 
examine the likelihood ratio test for testing H, and obtain the resulting A-criterion. Let 
the parameter space be О = ((u, Y)|X > О, —o < uj < оо, j=1,..., p, w= 
(Hi, ..., I4 p)). Let the joint density of X1,..., Xn be denoted by L. Then, as previously 
obtained, the maximum value of L is 


max L — —— X (6.9.1) 
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and the maximum under Н, is 


e 2077-10-60) E; | (Xj—Mo)) 


mai - - ; (6.9.2) 
Ho (277) 2 ||? 
Thus, 
ME um 
_ maxu L _ е e 200 ji (Xj -Ho 55 Go). (6.9.3) 


maxg L E n? |X,|? 


We reject Н, for small values of A. Since the exponential part dominates the poly- 
nomial part for large values, we reject for large values of the exponent, excluding 
(—1), which means for large values of Via (Xj — Ho) X5 (Xj — Ho) ^ ay since 


jid TT ; 
(Ху — ио)! XY j — Ho) ^ Xp for each j. Hence the criterion consists of 


n 
rejecting H, if the observed values of x — in) 23r (X; — Ho) > Хш 
j=l 


with 
Pr X ew (6.9.4) 


Let us determine the h-th moment of A for an arbitrary Л. Note that 


nph 
А^ = LLL Os e-duGs 18) — BR po зу Co) (6.9.5) 
nph nh И Б: 
n? |E? 
Since A contains $ and X and these quantities are independently distributed, we can in- 
tegrate out the part containing S over a Wishart density having m — n — 1 degrees of 
freedom and the part containing X over the density of X. Thus, for m — n — 1, 


fno ISIE s 
== dS 


nh nh h HE 
E [S17 1,12) e7 97^ 94a, Г epe per ru 
2:2 D») Vol? 2 


oG +h- 3) 
n 1 
Гъ(5 = 5) 


(1+ №) 120+ 51р, (i) 


Under Ho, the integral over Х gives 


| VM Ng RH) BH gx — dida. (ii) 
x Qo zu 
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From (i) and (ii), we have 


zph 2%" 


е2 Г, (5 (1+ Ал) – 1) 
nc Dy (551) 


E[A|H,] = (1 + hT, (6.9.6) 


The inversion of this expression is quite involved due to branch points. Let us examine 
the asymptotic case as п — oo. On expanding the gamma functions by making use of 
the version of Stirling's asymptotic approximation formula for gamma functions given in 


(6.5.14), namely I'(z +n) © Jn z^ t-e for |z| — oo and bounded, we have 


DG +h) – 3) JEU 


Dy (551) rea - 73 


j=l 


j—1 
j= V2n[5] $$ 


p. PPD 


nph 
BER. 2 —™ph “(a à h)209p-5 са. (iii) 


Thus, as п — со, it follows from (6.9.6) and (iii) that 


E[A^|H,] = (1 + hy O+) 


(6.9.7) 
which implies that, asymptotically, —2 In à has a real scalar chisquare distribution with 
p+ ppt degrees of freedom in the real Gaussian case. Hence the following result: 


Theorem 6.9.1. Given a N,(u, X), X > О, population, consider the hypothesis Ho : 
и = uo, X = X, where Uo and X, are specified. Let X denote the A-criterion for 
testing this hypothesis. Then, in the real case, —2lnà — xg asn — œ where 6 = 
p+ pips) and, in the corresponding complex case, —21n A —> xf as п — oo where 
61 = 2p + p(p + 1), the chisquare variable remaining a real scalar chisquare random 


variable. 


Note 6.9.1. In the real case, observe that the hypothesis Н, : и = Uo, X = X, imposes 
p restrictions on the u parameters and pepe) restrictions on the 27 parameters, for a total 
of p+ а restrictions, which corresponds to the degrees of freedom for the asymp- 
totic chisquare distribution in the real case. In the complex case, there are twice as many 
restrictions. 
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Example 6.9.1. Consider the real trivariate Gaussian distribution N3(u, 27), X > О 
and the hypothesis Н, : и = ио, X = X, where м, Xo and an observed sample of size 
5 are as follows: 


1 3 0 0 8 0 0 
na es | oos А 4 | = Sd Со(Ж„)=|0 9 6l, 
1 G2: 3 0 6 12 
8 0 0 
ES, 1 
= 200 09 65 
РЯ 0 6 12 
1 1 —1 =>) 2 
Ху=|1|, X= 0|, Хз = 1 |, Х4 = 1|, Х5 = | -–1 
1 —1 2 2 0 
Now, 
(eddy Game” tan уж ==” 
1 Ко о 1 Mo) = 24' 2 Шо о 2 Шо) = 24” 
ns dg ge оа и 
3 Шо о 3 Mo) = 24" 4 — Ho о 4 — Ко) = 24" 
= 20 
(X5 — ро) X (X5 — Mo) = л 
and 
2 1 337 
У(Х; = ш) Xz ' (Xj – ш) = 21136 + 33 + 104 + 144 + 20] = = = 14.04. 
j=1 


Note that, in this example, n = 5, p = 3 and np = 15. Letting the significance level of 
the test be о = 0.05, Н, is not rejected since 14.04 < xis, 0.05 = 23. 


6.10. Testing Hypotheses on Linear Regression Models or Linear Hypotheses 


Let the p x І real vector X ; have an expected value џ and a covariance matrix X > О 
for j = 1,...,n, and the X ;’s be independently distributed. Let X;, u, X be partitioned 
as follows where x;;, ш and oj; are 1 x І, шо), 221 are(p—1) x 1, Xi = P» and 
292 is (p — 1) x (p— 1): 


Xij Ш оір AX ; 
Х; = M е = = ! i 
i ГА P [і ie | M 
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If the conditional expectation of хі у, given X 2); is linear in X (2) ;, then omitting the sub- 
script j since the X ;’s are iid, it was established in Eq. (3.3.5) that 


E[xi|Xo)] = ш + 12555 (Хо) — uo). (6.10.1) 


When the regression is linear, the best linear predictor of x, in terms of Хо) will be of the 
form 


Е[х1Х0)] — EG) = 8'(Xo) — E(Xo))). В = (fo, .... Bp). (6.10.2) 


Then, by appealing to properties of the conditional expectation and conditional variance, 
it was shown in Chap. 3 that 6’ = X B2. Hypothesizing that Х (2) is not random, or 
equivalently that the predictor function is a function of the preassigned values of X(2), 
amounts to testing whether M53 = О. Noting that 25» > О since У > О, the 
null hypothesis thus reduces to Н, : X12 = О. If the original population X is p-variate 
real Gaussian, this hypothesis is then equivalent to testing the independence of x, and 
X (2). Actually, this has already been discussed in Sect. 6.8.2 for the case of k = 2, and 
is also tantamount to testing whether the population multiple correlation | (ә...) = 0. 


Assuming that the population is Gaussian and letting of u = An where A is the lambda 


criterion, u ^ type-1 beta( 757 ; pl) under the null hypothesis; this was established in 


Theorem 6.8.1 for pj = 1 a ae p — 1. Then, v = ;+ ~ type-2 beta (52, a), 
р—1) и 


1 ^-^ n=p Р At A СЧ 1 . TEN 
that is, v rm Fa p, p-1 OF [CEN EET Рр, p-1. Hence, in order to test Ho : B = О, 


reject Н, if Fs p, p-1 = Fs p, p-1, а» with Pr{Fr—p, p-1 = Fs p, p-l, a) =a 
(6.10.3) 
The test statistic u is of the form 


[S| (р—1) u 
и = S^ №, (п – 1, X), X > О, ~ 
5111522] d (п = p)l—u 


Fs p, р—1› 


where the submatrices of S, 511 is 1 x 1 and S22 is (p — 1) x (p — 1). Observe that the 
number of parameters being restricted by the hypothesis X12 = О is pj p2 = 1(р – 1) = 
p — 1. Hence as n — оо, the null distribution of —2 In А is a real scalar chisquare having 
p — 1 degrees of freedom. Thus, the following result: 


Theorem 6.10.1. Let the p x 1 vector X ; be partitioned into the subvectors x1; of order 
1 and Хо); of order p — 1. Let the regression of ху; on Xoj be linear in Хо) з, that is, 
E[x1jlXoj]— Ебу) = 8'(Х ооу) — E(X(»j)). Consider the hypothesis Н, : B = О. Let 
Xj ^ Ny(u, X), X > O, forj = l,...,n, the Xj's being independently distributed, 
and let à be the A.-criterion for testing this hypothesis. Then, as n — оо, —2 ln à — x d 
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Example 6.10.1. Let the population be N3(u, 27), X > О, and the observed sample of 
size n = 5 be 


1 —1 1 = 
жү Н жуш кй =, RN ые eg 
1 0 2 20 


The resulting sample average X and deviation vectors are then 


_ 4f) ЕТ Б: 
о орар ра, 
5 |2 1] 32/2] 343 
4 Nas 
X;-X-. -25X3)-X-2| 3], 
3 _2 
[4 x 
е Кы, ЖЕ EN ME 
5 |8 —12 


Letting 
X = [X),..., Xs], X = [X,..., X] and $ = (X — X)(K — Xy, 


е 4 6 4 —6 
Xx l uum. 313$. ху. 
UICE —2 8 -12 


120 40 140 А $ 
Sido 280° 730» | es do R, 
2 [ $22 


3° | 140 105 230 


20 1 
511 = ——‚ $12 = 55140. 140], 


25 
" _ 1780 105] <4 25 | 230 -105 
22 = 25 |105 230| ^2 = 7375|—105 80 |’ 


1 47 25 230 —105][ 40 760000 
=i] г. = 
Tuo T od al ий Е 80 | bd (25)(7315)' 
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so that the test statistic is 


811 — 81255) $51 i 512555 S21 


u= = Ia e m dest 
$11 811 1.(2,3) 
760000 
= 1 = |] — 0.859 = 0.141 
(120) (7375) 7 
(р—1) и 24 /0.141 
= = = 0.164, v ~ 66. 
(n—p)l-—u (5) (6) р p.p-l 


Let us test H, at the significance level а = 0.05. In that case, the critical value which is 
available from F tables is Fa—p, p—1, а = Р, 2, 0.05 = 19. Since the observed value of v is 
0.164 < 19, Н, is not rejected. 


Note 6.10.1. Observe that 
S 085; S21 


2 


where rj.(2,3) is the sample multiple correlation between the first component of X; ~ 
№(и, X), X > О, and the other two components of X ;. If the population covariance 
matrix 27 is similarly partitioned, that is, 


Dy . : 
due EH 12 |. меге от is 1 x 1, Zo; is (p — 1) x (p — 1), 
251 MX» 


then, the population multiple correlation coefficient is p; (?,3 where 


2 34937 521 


01.(2,3) = TUR 


Thus, if Xj? = 35 = О, о\ озу = 0 and conversely since o4; > 0 and Xn > О => 
bre > О. The regression coefficient В being equal to the transpose of 2; 55. X = 
O also implies that the regression coefficient 6 = O and conversely. Accordingly, the 
hypothesis that the regression coefficient vector 8 = О is equivalent to hypothesizing 
that the population multiple correlation p1,(2,...,p) = 0, which also implies the hypothesis 
that the two subvectors are independently distributed in the multivariate normal case, or 
that the covariance matrix X12 = О, the only difference being that the test on regression 
coefficients is in the conditional space whereas testing the independence of the subvectors 
or whether the population multiple correlation equals zero is carried out in the entire space. 
The numerical example included in this section also illustrates the main result presented in 
Sect. 6.8.1 in connection with testing whether a population multiple correlation coefficient 
is equal to zero. 
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6.10.1. A simple linear model 


Consider a linear model of the following form where a real scalar variable y is esti- 


mated by a linear function of pre-assigned real scalar variables z1,..., 24: 

yj = бо + Bizij ++ bazaj tej, PSH m (i) 
where yj, ..., Yn are n observations on y, Zil, Zj2,---, Zin, 1 = 1,..., 9, are preassigned 
values on zj,..., 24, and Bp, B1, ..., B, are unknown parameters. The random compo- 
nents ej, j = l,...,n, are the corresponding sum total contributions coming from all 


unknown factors. There are two possibilities with respect to this model: В, = 0 or B, = 0. 
If В, = 0, Bo is omitted in model (i) and we let y; = ху. If By # 0, the model is modified 
by taking x; — y; — y where y — Т(у T уп), then becoming 


Kj = yj — Y = Bilary — 21) +--+ + Bg qj — Zq) + Ej (ii) 
for some error term Ej, where 2; = (gj +--+ Zn), i = 1,.. ods Letting 27 = 
(213, -- -> 243) if Bo = О and 7, = (Zij — Z1,---,%gj — Zq), otherwise, equation (ii) can 


be written in vector/matrix notation as follows: 


ej = (xj— Bizij — +++ — Вај) = (ху — B'Zp), В = (By... Bg), 
п п п 
dF = ) (0 — Вау: – 84) = » (ху — P'Zjy. 
j=l j=l j=l 
Letting 
Xi e E T 
CS NECEM ae Oe ЕМ (iii) 
Xn En ` oe 
<д1 ^s Zqn 
n 
= (X' — P'Z) > у єў=єє=(Х'— В'2)(Х – zh. (iv) 
j=l 


The least squares minimum is thus available by differentiating є'є with respect to В, equat- 
ing the resulting expression to a null vector and solving, which will produce a single criti- 
cal point that corresponds to the minimum as the maximum occurs at +00: 
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= 


s j) = оз Yuz-i £77 


j=l 
. n _1 п 
== 52) (X2) (v) 
j=l j=l 
ae €) = O > В =(ZZ')'ZX, (vi) 


that is, 
pe DAREA! (6.10.4) 
j=1 j=1 


Since the z;;'s are preassigned quantities, it can be assumed without any loss of generality 
that (Z Z’) is nonsingular, aug thereby that (ZZ) 7 l exists, so that the least squares mini- 
mum, usually denoted by s?, is available by substituting B for В in e'e. Then, at В = B, 


€ ls. - = Х'—Х'7'(77')'7=Х'[1— 7'(77') !Z]so that 
52 = є Ро гаа Z(ZZ»)zx (6.10.5) 


where J — Z'(ZZ')-!Z is idempotent and of rank (n — 1) — q. Observe that if B, Æ 0 
in (i) and we had proceeded without eliminating В,, then В would have been of order 
(k 4- 1) x Land 1 — Z'(ZZ^)-! Z, of rank n — (q +1) 2 n — 1 — q, whereas if В, 40 and 
we had eliminated f, from the model, then the rank of J — Z'(ZZ' )-!Z would have been 
(n — 1) — q, that is, unchanged, since X40 —х)2 = X'[I e ЫХ, (с) 
and ће rank of J — IJ = п – 1. 


Some distributional assumptions on є; are required in order to test hypotheses on f. 
Let e; ~ N\(0,07), o? > 0, j = 1,...,n, be independently distributed. Then x; ~ 
Ni (6'Z b o’), j =1,...,n are independently distributed but not identically distributed 
as the mean value depends on j. Under the normality assumption for the є ;’s, it can readily 
be seen that the least squares estimators of f and o? coincide with the maximum likelihood 
estimators. It can also be observed that o? is estimated by Е where и is ће sample size. 
In this simple linear regression context, the parameter space Q = {(B, o?)|o? > 0}. Thus, 
under the normality assumption, the maximum of the likelihood function L is given by 


(vi) 
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Under the hypothesis Н, : В = O or Bj = 0 = --- = By, the least squares minimum, 
usually denoted as 52, is X’X and, assuming normality, the maximum of the likelihood 


о? 


function under Н, is the following: 


m (vii) 


Thus, the A-criterion is 


TE тахн, L _ EE 
maxg L 
2 XU zzz tz 
их 
Х'1— 7'(77'у-\7]Х 
X'[I = 7'(77')-17]Х+Х'7'(77')-17Х 
1 


= 6.10.6 
[b ( ) 


= u-—A 


where 
Х'7'(77')!7Х 52 — s? " 
ui = —————Á E (viii) 
ХИ — 7'(77')-17]Х 52 

with the matrices Z'(ZZ/)-!Z and I — 7/(77/) 17. being idempotent, mutually orthog- 
onal, and of ranks g and (n — 1) — q, respectively. We can interpret s — s? as the sum of 
squares due to the hypothesis and s? as the residual part. Under the normality assumption, 
52—52 and s? are independently distributed in light of independence of quadratic forms that 
was discussed in Sect. 3.4.1; moreover, their representations as quadratic forms in idem- 


LER : 
potent matrices of ranks q and (n — 1) — q implies that te y ~ х2 апа s ~ Xb elis 
Accordingly, under the null hypothesis, 

_ (052 — 82)/9 
s*/[G — 1) — gl 
that is, an F-statistic having g and n — 1 — q degrees of freedom. Thus, we reject H, for 


small values of A or equivalently for large values of из or large values of Fy, „14: Hence, 
the following criterion: 


"m ~ Е 1-4, (6.10.7) 


Reject Н, if the observed value of uz > Fg,n—1-g, а, Pr{Fg,n-1-q Z Fg,n-1-4, а) = 9. 

(6.10.8) 
A detailed discussion of the real scalar variable case is provided in Mathai and Haubold 
(2017). 
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6.10.2. Hypotheses on individual parameters 


Denoting the expected value of (-) by E[(-)], it follows from (6.10.4) that 


Е[8] = (ZZ) ZE(X) = (ZZ) ' ZZ'B = B, E[X] = Z'B, and 
Cov(f) = (ZZ)! ZICovDO]Z (ZZ)! = (ZZ)! Zie?DZ (ZZ)! 
= 02(27')-!. 


Under the normality assumption on xj, we have B ^ МВ, o?(ZZ/)-!). Letting the 
(r, r)-th diagonal element of (Z Z' )-! be b,,, then Br the estimator of the r-th component 
of the parameter vector В, is distributed as 6, ~ № (8,, B bu). so that 


By — Br 


б brr 


~ 14 (6.10.9) 


where ї„—1—4 denotes a Student-r distribution having n — 1 — q degrees of freedom and 


А Ev А ' 2. І T А 
ô? = TEUER is an unbiased estimator for o?. On writing s? in terms of e, it is easily 


seen that Е[52] = (n — 1 — q)o? where s? is the least squares minimum in the entire 
parameter space ©. Thus, one can test hypotheses on В, and construct confidence intervals 
for that parameter by means of the Student-r statistic specified in (6.10.9) or its square 


t Я which has ап F distribution having 1 and n — 1 — q degrees of freedom, that is, 
ы d Fi п—1—а. 


Example 6.10.2. Let us consider a linear model of the following form: 
уу = bo + Втр + Bazaj а /=1,...›п, 


where the z;;’s are preassigned numbers of the variable z;. Let us take n = 5 and q = 2, 
so that the sample is of size 5 and, excluding В,, the model has two parameters. Let the 
observations on y and the preassigned values on the z;'s be the following: 


1 = bo + 81(0) + 81(1) + е 

2 = Bo + £10) + (71) + е2 

4 = В, + Bi(-1) + @2(2) + ез 

б = Bo + Bi(2) + Bo(—2) + ед 

7 = Bo + Bi(—2) + 22(5) + еѕ. 
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The averages on y, zı апа 22 are then 
у= s +2+4+6+7] =4, Z3 = s LE (7D +24 (72) — 0 and 
Z2 = all +(-D+2+(-2) +5) == 

and, in terms of deviations, the model becomes 


xj = yj — У = pi (zij — 21) + Bol(Z2j — 22) + €j, €j =ej – е. 


That is, 
—3 0 0 €1 
—2 1 —2 €2 
0|2|-1 1 ЙЕ єз| = Х= 2 В + є. 
2 2з Ш єз 
3 —2 4 Є5 
When minimizing є'є = (X — 7/8)'(Х — 7/8), we determined that B, the least squares 
estimate of В, the least squares minimum s? and 52 — 52 = corresponding to the sum of 


squares due to В, could be express as 
Ё =(ZZ')'ZX, s? 52 = X'Z'(ZZ') 'ZX, 
s? = X'[I — Z(ZZ») Z]X and that 
Z(ZZ')'Z = [Z'(ZZ 'ZP,1-Z (ZZ) Z = [I - 20227), 
[Г[— 227) ZZ zz’) \7]= О. 


Let us evaluate those quantities: 
|O 1—1 2-2 ‚_ | 10 —17 
Zea —2 1 -3 a Wea alae a 


| 
IZZ'| = 11, Cof(ZZ’) = Ё | zzyi-l D ul 


17 10|][0—2 1—3 4 
al -4 ENT 


zzy'z-l 30 17][0 1—1 2—2 
B 


1110 3 -7 4 6 
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—3 
1 fo а -13 9 8]|~2 
ô — ^—1 E. 2 "s 
= zx = | za um A 3 | 
3 
_ 1 (SOY! f 
11 [32] LJ’ 
0 0 
M 2 
0 —4 —13 9.8 
Ж(77/у)у!7=-—|—1 1 | | 
Е Ж еж ЖО: 
sv 4 
0 0 0 0 0 
КЖ M 
cm 0 1 6 —5 -2 
D 1—5 6 -2 
lo e ocu в | 
Then, 
о о 0 0 O][-3 
0 2 1 1 —4 ||—2 
lol у—1 1 
X'Z'(ZZ') ZX = qt [73 -2023]|]0 1 6 -5 -2 0 
0 1-5 6 -2 2 
bio ea 97 2D в || 3 
120 
== = Х'Х = 26, 
11 
120 166 
X'U 7'(77'у`\7]Х = 26 — — = —, withn = 5 апад = 2. 


11 117 
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The test statistics и and из and their observed values are the following: 


XT ЕХ 52 — 52 
uj] = = 
| XU — 7/(77')-17]Х 52 
120 
= — = 0.72; 
166 
(80 — 82)/9 р 
ил = ~ п—1— 
52001-4) ©* 4 
120/2 
= см 
166/2 
Letting the significance level be a = 0.05, the required tabulated critical value is 


Fg,n—1-q,a = F2,2,0.05 = 19. Since 0.72 < 19, the hypothesis Н, : В = О is not 
rejected. Thus, we will not proceed to test individual hypotheses on the regression coeffi- 
cients f, and f». For tests on general linear models, refer for instance to Mathai (1971). 


6.11. Problem Involving Two or More Independent Gaussian Populations 


Consider k independent p-variate real normal populations X; ~ №, (иу), X), X > 
О, j =1,...,k, having the same nonsingular covariance matrix X but possibly different 
mean values. We consider the problem of testing hypotheses on linear functions of the 
mean values. Let b = ари) +--+ + акш) Where a4, ..., ак, are real scalar constants, 
and the null hypothesis be Н, : b = b, (given), which means that the a;'s and j1,;)’s, 
j= 1,..., К, are all specified. It is also assumed that X is known. Suppose that simple 
random samples of sizes n1, ...,n, from these k independent normal populations can be 


secured, and let the sample values be X jg, q = 1,..., nj, where X ji, ..., X jin are lid as 
М›(исуу, X), X > О. Let the sample averages be denoted by X; = T 21 Xip J = 
1,..., k. Consider the test statistic Uy = a4 X, +--+ + aj X4. Since the populations are 
independent апа U% is a linear function of independent vector normal variables, Ug is 


normally distributed with the mean value b = ai uq) +: - Бак) and covariance matrix 
lx, where 1 = ( ++ 4L) and so, Jn X-3(U, — b) ^ N,(O, I). Then, under the 
hypothesis Н, : b = bo (given), which is being tested against the alternative Н] : b Æ bo, 
the test criterion is obtained by proceeding as was done in the single population case. Thus, 
the test statistic is z = n(Uy — bo) X =l (Uk — bo) ~ k and the criterion will be to reject 
the null hypothesis for large values of the z. Accordingly, the criterion is 


Reject H, : b = b, if the observed value of n(U; — bo) X (Uy — bo) > X5. РИ 
with Pr(x7 > Xp, a} = a. (6.11.1) 
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In particular, suppose that we wish to test the hypothesis Н, : 6 = ш) — U2) = бо, such 
as dg = О as is often the case, against the natural alternative. In this case, when à, = 0, 
the null hypothesis is that the mean value vectors are equal, that is, р) = о), and the 
test statistic is z = n(X4 — X2) X -!(X4 — X3) ~ x with 1 = m + з the test criterion 
being 


Reject Ho : ua) — шо) = 0 if the observed value of z > a з 
with Pr(x2 > x5 a} = o. (6.11.2) 


For a numerical example, the reader is referred to Example 6.2.3. One can also determine 
the power of the test or the probability of rejecting Н, : à = ua) — Шо) = бо, 
under an alternative hypothesis, in which case the distribution of z is a noncen- 
tral chisquare variable with p degrees of freedom and non-centrality parameter 
A= р (8 —8,) 108 — ôo), 5 = ua) — шо), where nı and пә are ће sample sizes. 
Under the null hypothesis, the non-centrality parameter А is equal to zero. The power is 


given by 


Power = Pr{reject Ho|Hi} = Pr(x20) = xp, „(А)}. (6.11.3) 


When the population covariance matrices are identical and the common covariance matrix 
is unknown, one can also construct a statistic for testing hypotheses on linear functions 
of the mean value vectors by making use of steps parallel to those employed in the single 
population case, with the resulting criterion being based on Hotelling’s T? statistic for 
testing H, : M) = Шо). 


6.11.1. Equal but unknown covariance matrices 


Let us consider the same procedure as in Sect. 6.11 to test a hypothesis on a linear 
function b = ajua) +--+ + акш) Where aj,..., ак are known real scalar constants 
and шеу), j = 1,...,k, are the population mean values. We wish to test the hypothesis 
Ho : b = b, (given) in the sense all the mean values uj, j = 1, , k and aj,..., ak, 
are specified. . Let a = aX е -+ акк as helene defined. Then, E [U,] = =b 

a? 
and Cov(U;) = nt ee + x, where GL . +% = 1 for some symbol n. 
The common covariance matrix X has the MLE e. 1+: еї Sk) where S; is the 


nick 
sample sum of products matrix for the j-th Gaussian E oh It has been established 


that S = S; +.----+ Sk has a Wishart distribution with (nı — 1) 4---- + (ng — 1) = 
N — k, N=n,+---+nk, degrees of freedom, that is, 


S~W,(N—k, X), >> О, М№= п +: п. (6.11.4) 
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Then, when 27 is unknown, it follows from a derivation parallel to that provided in Sect. 6.3 
for the single population case that 


N=k= 
w = n(Uy — bY'S- (Uy — b) ~ type-2 beta (5 ——), (6.11.5) 
or, Ш ш а real scalar type-2 beta distribution with the parameters (5, Ма - Р). Letting 


ш = тук = z P this F is an F-statistic with p and N — k — p degrees cof freedom: 


Theorem 6.11.1. Let Uz, п, N, b, S be as defined above. Then w = n(U, — bys-! 
(Uy — b) has i real scalar type-2 beta distribution with the parameters (&, NEP), 
Letting w = y= yas па this F is an F-statistic with p and М — k — p degrees у lon 


Hence for testing the hypothesis H, : b = b, (given), the criterion is the following: 
М№- к-р 
р 
with Pr(F, „Мк р > Fp, N-k-p, a} = Q. (6.11.7) 


Reject H, if the observed value of F = w > FpN-k-p, a» (6.11.6) 


Note that by exploiting the connection between type-1 and type-2 real scalar beta random 
variables, one can obtain a number of properties on this F-statistic. 


This situation has already been covered in Theorem 6.3.4 for the case k — 2. 
6.12. Equality of Covariance Matrices in Independent Gaussian Populations 


Let X; ^ Ny(u(j, Ej) Yj > О, j = 1,...,k, be independently distributed 
real p-variate Gaussian populations. Consider simple random samples of sizes n1, ..., пк 
from these k populations, whose sample values, denoted by X jq, q = 1,...,nj, are iid 
as Ху, j = 1,...,k. The sample sums of products matrices denoted by 51,..., Sx, 
respectively, are independently distributed as Wishart matrix random variables with n; — 
1, j =1,...,k, degrees of freedoms. The joint density of all the sample values is then 
given by 


-lutr.!s5-(Qt-nayE;X;-u) 
2 j RPT 2 M j—-HMG) 2} j MG) 


k 

e 

= | 7 Lj = njP nj , (6.12.1) 
j=l (2л) 2 |5313 


= ^ 


the MLE's of из) and X; being иу = Xj and X; = lS. The maximum of L in the 
J 
entire parameter space © is 


k Tan eF 
mare [linie теит N =ni techn. (i) 
Q | Q Jp 
j=1 (2л) 2 П 115412 
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Let us test the hypothesis of equality of covariance matrices: 
Н: У] = У) >) = У 


where X is unknown. Under this null hypothesis, the MLE of jj; is X j and the MLE of 
the common X is (S1 +++: Sk) = WS, N nic ni, S = S1 +--+ Sk. Thus, 
the maximum of L under H, is 


Np | Np Np | Np 
L Ме? N?2e 2 (ii) 
max L а р 
Ho ny Kals? Ол) #1512 
and the A-criterion is the following: 
NT maus] 
812 [TI an; | 
Np 
N23 


E ads E ч 
yh a в | =e" TT isa? hsr. (6.12.3) 
j=1 


The factor causing a difficulty, namely | S| "i , will be replaced by an equivalent integral. 


Letting Y > О be a real p x p positive definite matrix, we have the identity 


h 1 hop Nh =] 
18172 = m ү Ee May, NES) > =, S> 0, (6124) 
Ip (>) Y>O 2 2 
where 
(Y S) = tr(Y S1) +--+ + tr(Y S). (iii) 


Thus, once (6.12.4) is substituted in (6.12.3), А” splits into products involving S;, j = 
1,...,; this enables one to integrate out over the densities of S;, which are Wishart 
densities with m; = n; — 1 degrees of freedom. Noting that the exponent involving 5; is 
1005715) = (78у) = — 1015307! + 2Y)], the integral over the Wishart density of 
S; gives the following: 
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[sm f. HE nets ems, 
j4323 Ty (BA) gp 457-0 
age Т, Е ры 
Е Гу} + ІХ + 2yp Gt 
j=l DC) E; т 
njh m; n;h 
ph | 2 1 -2XjY CE 5r mi do ия 
m ү тыл ы ode c @ 
Jd DC) 
Thus, on substituting (iv) in E [A^], we have 
[ сїр МА pii 
E[à"] = T | |j 
ry ^) 
j jh 
31 СИ +2) die CE domes 
х | П TED lay, (6.12.5) 
j=l 


which is the non-null h-th moment of A. The A-th null moment is available when X, = 
-= Xy = X. In the null case, 


k njh 
| aen Ce PUR 20 
Pes "Ir 2YX|- eem ELD) (v) 
ii DOD C 


Then, substituting (v) in (6.12.5) and integrating out over Y produces 


MES PEZ + 5р) 
E(a"|H,] 2 c^ —~ та. (6.12.6) 
DE ull Ft a L) | 


Observe that when h = 0, E [A^] Н] = 1. For h = s — 1 where s is a complex parameter, 
we have the Mellin transform of the density of A, denoted by f(A), which can be expressed 
as follows in terms of an H-function: 


1 Tr, E 
го) = 2 it 
[Ta PoC} 


A) 


ACE 5 
=] A wersi 6107) 
cl(-4, 4), j=1,...,k 
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and zero elsewhere, where the H-function is defined in Sect. 5.4.3, more details being 
available from Mathai and Saxena (1978) and Mathai et al. (2010). Since the coefficients 
of а in the gammas, that is, 71, ..., ny апа N, are all positive integers, one can expand 
all gammas by using the multiplication formula for gamma functions, and then, f(A) can 
be expressed in terms of a G-function as well. It may be noted from (6.12.5) that for 
obtaining the non-null moments, and thereby the non-null density, one has to integrate out 
Y in (6.12.5). This has not yet been worked out for a general k. For k = 2, one can obtain 
a series form in terms of zonal polynomials for the integral in (6.12.5). The rather intricate 
derivations are omitted. 


6.12.1. Asymptotic behavior 


We now investigate the asymptotic behavior of —21nA as n; — oo, j = 1,...,k, 
М = nj 44 пк. On expanding the real matrix-variate gamma functions in the the 
gamma ratio involving h in (6.12.6), we have 


Tangata} П | РАГС +h)-5- 55] 


=> ; (i) 

T (ŽA +h) – 5) Pradath)-£-54 
excluding the factor containing zr. Letting ^ (1+h) — оо, j = l,...,k, and A (+h) — 
oo asn; > oo, j = l,..., К, with N — oo, we now express all the gamma functions in 


(i) in terms of Sterling's asymptotic formula. For the numerator, we have 


j=l i=l 
k | nj р _ pr) ns 
- Поа +" ИА 
2 
j=l 
k "RS p. р(р+1) 
Е, aT EA 2 4 (a 
j=l 


N kp (p+) 
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and the denominator in (i) has the following asymptotic representation: 


i 


: У(1+һ)—%—5 vw 
[[r(a*»-5- 5) = Пеана dE 


i-l 


N kp _ р(р+1) 
2 


kp __р(р+1) 


x (1+ Aye 0+0Р- 7-24 (iii) 


; : " j-1 
Now, expanding the gammas in the constant part Г, 01) / П 1 Г, (25-) and then tak- 
ing care of c^, we see that the factors containing л, the n j Sand N disappear leaving 


(py, 


Hence —2 ln à — a p eb and hence we have the following result: 
Theorem 6.12.1. Consider the A-criterion in (6.12.2) or the null density in (6.12.7). 
When пу — oo, j = 1,..., К, the asymptotic null density of —21n A is a real scalar 
chisquare with (k — p Art D degrees of freedom. 


Observe that the number of parameters restricted by the null hypothesis Н, : 5 = 

-= Xk = У where X is unknown, is (k — 1) times the number of distinct parameters 

in X, which is мє. which coincides with the number of degrees of freedom of the 
asymptotic chisquare distribution under Ho. 


6.13. Testing the Hypothesis that k Independent p-variate Real Gaussian Popula- 
tions are Identical and Multivariate Analysis of Variance 


Consider k independent p-variate real Gaussian populations X;; ^ Np(u, Ej), 
Xi > О, i = 1,...,k, and j = 1,...,n;, where the p x 1 vector X;; is the j-th 
sample value belonging to the i-th population, these samples (iid variables) being of sizes 
n1, ..., nk from these k populations. The joint density of all the sample values, denoted 
by L,can be expressed as follows: 


k ni 
Sms = 


е -ii 10Х = -uOy р! (Xij— —p) 
Qr)? |X? 


e- 3610; 50-4 GG —wOy D Xiu) 


Qx)7|2/2 
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where X; = (Xa +--+ Xin), i — 1, ..., k, and ЕХ] = н, j = 1,..., ni. Then, 
letting N = nı +---+nk, 


T E n?) eX 
шах = Į [max Li = = || ——— 
i=l i= 1 Q2) 2 |S; |2 
ER П, пі? Je a 
© Ол)? "(TTE —1 |Si E 1) 
Consider the hypothesis Н, : и) = ... = ш =p, Xi =... = Ly = X, where u 


and X are unknown. This corresponds to the hypothesis of equality of these k populations. 
Under Н, the maximum likelihood estimator (MLE) of џи, denoted by f, is given by 
й = tin X1 +..-+n,X;,]| where № and X; are as defined above. As for ће common X, 
its MLE is 


k ni 
y= 39377 — £)(Xij — à) 


і=1 ј=1 


1 V ^ V ^ 
тем E L - A(X — &y] 


where S; is the sample sum of products matrix for the i-th sample, observing that 


П; nj 
D Xy- (00; Й) =) Xy- Xi + Xi -AXi — Xie Xi — Й) 
j=l j=l 


= EOS — X)(Xij; — Xi + У(Х; — fi)(X; — py 
j=l j=l 
= Si + ni(Xi — й)(%; — Й). 


Hence the maximum of the likelihood function under H, is the following: 


Np Np 
е? үз 
тах L = - = 
" (2m)? |S 4 Die ni (Xi — à); — ÀyI? 
where S = S1 +---+ 5. Therefore the A-criterion is given by 
k un Np 
ma ; LIS) 2 }N 2 
jie Же Ше 1505) | (6.13.1) 


Чр T7 ^ <7 ^ N 
Тахо — (Ian; HS + Y ani; — à); — yl? 
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6.13.1. Conditional and marginal hypotheses 


For convenience, we may split à into the product 4145 where Aj is the A-criterion for 
the conditional hypothesis Ho, : и =... = pw =p given that 5 =---= 2% = X 
and 2» is the A-criterion for ће marginal hypothesis Но : 3; =--- = Xy = X where 
и and X are unknown. The conditional hypothesis Н, is actually the null hypothesis 
usually being made when establishing the multivariate analysis of variance (MANOVA) 
procedure. We will only consider Н, since the marginal hypothesis H,2 has already been 
discussed in Sect. 6.12. When the 27;'s are assumed to be equal, the common 27 is esti- 
mated by the MLE + Gi + +--+ Sk) where S; is the sample sum of products matrix in 
the i-th population. The common wu is estimated by n(n X poo ni Xi). Accordingly, 
the A-criterion for this conditional hypothesis is the following: 


N 
1512 


Aq " — === LN 
IS + oi aU ni(Xi — À)(Xi — ÀAy|2 


(6.13.2) 


where 5 = Sı +--+ Sk, ДИ = x quXi Tee + nXx), N = ni +--+ пк. Note 
that the S;’s are independently Wishart distributed with n; — 1 degrees of freedom, that is, 


S; ing Wp(ni— 1, E), i=1,...,k, and hence S ~ W,(N —k, X). Let 
k 
Q = Y ni; — A(X; – Ay. 
1=1 


Since О only contains sample averages and ће sample averages and the sample sum of 
products matrices are independently distributed, Q and S are independently distributed. 
Moreover, since we can write X; — {i as (X; — ш) — (A — u), where и is the common true 


mean value vector, without any loss of generality we can deem the X;'s to be independently 
Ny(O, x E) distributed, i = 1,...,k, and letting Y; = Jn Xi, one has Y; i. Ny(O, X) 


under the hypothesis Ну. Now, observe that 


zx & d. 2 _ 
Xi = Xi - (ЖАП nig) 


n Е n;. _ n; = n; = п = 
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k 
Q =) nj (Xj – à); — y 
i=l 
k 
= Уп Х;Х; — Мій (i) 
i=l 


k 
1 
= › ҮҮ, — убт Toc ngY) Ст + nkYo), 
i=l 


where |/n1Y, +: + пк Үк = (Y1, ..., Ye) DJ with J being the k x 1 vector of unities, 
J’=(1,..., D, and D = diag(./nj, ..., ny). Thus, we can express Q as follows: 


1 
Q = Qi. Yo - FDIS DMN, .. Yay (iii) 


Let B — xDJJ'D and A = І — B. Then, observing that J'D*J = N, both B and 
A are idempotent matrices, where B is of rank 1 since the trace of B or equivalently 
the trace of xd 'D?J is equal to one, so that the trace of A which is also its rank, is 
k — 1. Then, there exists an orthonormal matrix P, PP’ = Ij, P'P = Ix, such that 

/ ц О 
"E 
(Ui, ..., Uk) = (Y1, ..., Y) P", the Uj's are still independently N,(O, X) distributed 
under Ho, so that 


| P where О is a (k — 1) x 1 null vector, О” being its transpose. Letting 


1—1 


О 
О = (Ui, ..., Ux) | О' | (Ug U = Cie Up O)(U1, ias Uca OY 


k-1 
= 9 ШШ ~ W,(k — 1, 2). (iv) 
i=l 


Thus, Soa ni(X; - ÀA)(X; = fh)! ~ „(М —1, X), which clearly is not independently 
distributed of S, referring to the ratio in (6.13.2). 


6.13.2. Arbitrary moments of ^, 
Given (6.13.2), we have 


1812 A" E 
M у АЁ = 1512415 + 01-3 (v) 
IS + QI? 
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where |S + Q|~ will be replaced by the equivalent integral 


T>O 


TC) 
with the p x p matrix Т > О. Hence, the h-th moment of A, for arbitrary h, is the 
following expected value: 


ED4Ha] = E(ISIT |S + ОГ? ). (vii) 


We now evaluate (vii) by integrating out over the Wishart density of S and over the joint 
multinormal density for U1, ..., Ок: 


h 1 ivi pH 
E[X1|Hog1] = r, (Rb hy atl р] 2 
"Telit LN k кр poc 
(bp dS 


2 3p Epp 


pee U!Z-U;-Y (TUUN) 
j Í 
Ui, Uk 


dU; ^... ^dU, лат. 


(k— 50р 


k-1 
xij Ga) [| 


The integral over S is evaluated as follows: 


eee ыле EX ec t(D G42T 8) 
EE ER EE 
2 Ip) 2 | 7 
Nhp Г a Nh | _ 
ж PPC oer с, Т 
Гр) 


for/+2X;'T > О, no ES Nn) > a The integral over U1, ..., Ux. is the following, 
denoted by 86: 


е5 X ULE Uic TUI 
= Tm тр = dU, ^... лаб 
[X| 2 


Us Uj | (2л) 


where 


(SOT U:U}) = «Qu тїй) -Xuru 
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since U; TU; is scalar; thus, the exponent becomes — 
integral simplifies to 


ly U^! + 2T]U; and the 


p(T FOS THe TST 0. (ix) 
Now the integral over T is the following: 
1 z 
= |, Vi gamer Ули Uv: 
yy) 
TC + E71) vw ом QN TE | Nh. pl 
= eS үүт, бе =т= =, (x) 
Г +557 t) 
Therefore, 
Г„(®—* + Nh I5 S cm 
ЕГН) = IP WB 2) (6.13.3) 


mA TOE + М + 55 


for R(X + МВ) > 271 


6.13.3. The asymptotic distribution of —2 In А 


An asymptotic distribution of —2 In A, as N — оо can be derived from (6.13.3). First, 
on expanding the real matrix-variate gamma functions in (6.13.3), we obtain the following 
representation of the -th null moment of А1: 


P pika EL. 7-1, 
h - 2 2 2 
EA; | Hoi] = | |I rast 1—1 | 


j=l E nM E 
р ra hk- 

Slo ccc (xi) 
p Бл ж ta e 


Let us now express all the gamma functions in terms of Sterling's asymptotic formula by 
taking x — oo in the constant part and x 1 +h) — оо in the part containing h. Then, 


Pd) =; _, Ox)? [X 4-4)]2099-$-5. сЗа 
rG Еее 1) nyt (X0 юно iet E etn 
— — (xii) 
= ET xii 
[20 4A] T 
ГМЕ EA. 0—1 wy 
Q 1*2 2) ° (xiii) 


г ee 2 
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Hence, 


E[ Hoi] > (+h) 2 as N > оо. (6.13.4) 


Thus, the following result: 


Theorem 6.13.1. For the test statistic Эл given in (6.13.2), -21n A1 > Хар that is, 
—2 In А tends to a real scalar chisquare variable having (k — 1) p degrees of freedom as 
N > oo with N =n, c --- c ng, nj being the sample size of the j-th p-variate real 
Gaussian population. 


Under the marginal hypothesis Ho? : 2; = --- = Xy = X where X is unknown, 
the A-criterion is denoted by A», and its h-th moment, which is available from (6.12.6) of 
Sect. 6.12, is given by 


njh 


Tr, (45) P DES +) 
EIH = c — r] 2a (6.13.5) 
? Fag mUl D) | 


for REL : + > pl „у= 1, ‚ k. Hence the Л-Ш null moment of the А criterion 
for testing the hypothesis H, of сопан of the k independent p-variate real Gaussian 
populations is the following: 


ев 


h 
Tp CT ж E ^ DC + 55) 
En ell LA (6.13.6) 
Ao NT EH) j= р 7 ) 
sap > = j=1,...,k, N =n, +--++ пк, where c is the constant 


associated with the h-th moment of А2. Combining Theorems 6.13.1 and Theorem 6.12.1, 
the asymptotic distribution of —2 In А of (6.13.6) is a real scalar chisquare with (k — 1)p+ 
(К — 1) 224D degrees of freedom. Thus, the following result: 


Theorem 6.13.2. For the -criterion for testing the hypothesis of equality of k indepen- 
dent p-variate real Gaussian populations, —21n à —> X2. v=(k—1)pt+tk— 2+0 
asnj — oo, j = 1,...,К. 


Note 6.13.1. Observe that for the conditional hypothesis Н, in (6.13.2), the degrees of 
freedom of the asymptotic chisquare distribution of —21n A, is (k — 1) p, which is also 
the number of parameters restricted by the hypothesis Н. For the hypothesis Ho2, the 
corresponding degrees of freedom of the asymptotic chisquare distribution of —2 In A2 is 
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(k — 1) mn which as well is the number of parameters restricted by the hypothesis Н. 
The asymptotic chisquare distribution of —2 ln А for the hypothesis Ho of equality of k 
independent p-variate Gaussian populations the degrees of freedom is the sum of these 
two quantities, that is, (k — 1)p + (k — p Per) = p(k — p. which also coincides 
with the number of parameters restricted under the hypothesis Ho. 


Exercises 


6.1. Derive the A-criteria for the following tests in a real univariate Gaussian population 
М\(и, о?), assuming that a simple random sample of size n, namely x1,..., Xn, which 
are iid as № (и, 07), is available: (1): и = Ho (given), с? is known; (2): ш = ио, о? 
unknown; (3): o? = 2 (given), also you may refer to Mathai and Haubold (2017). 

In all the following problems, it is assumed that a simple random sample of size n is 
available. The alternative hypotheses are the natural alternatives. 


6.2. Repeat Exercise 6.1 for the corresponding complex Gaussian. 


6.3. Construct the A-criteria in the complex case for the tests discussed in Sects. 6.2—6.4. 


6.4. In the real p-variate Gaussian case, consider the hypotheses (1): 27 is diagonal or 
the individual components are independently distributed; (2): The diagonal elements are 
equal, given that 27 is diagonal (which is a conditional test). Construct the A-criterion in 
each case. 


6.5. Repeat Exercise 6.4 for the complex Gaussian case. 


6.6. Let the population be real p-variate Gaussian Np(u, X), X = (о) > О, W = 


(41, ..., Шр). Consider the following tests and compute the A-criteria: (1): оп = ··· = 
Opp — o?, oj; = v for all i and j, i Æ j. That is, all the variances are equal and all the 
covariances are equal; (2): In addition to (1), ш = u2 = --- = wor all the mean values 


are equal. Construct the A-criterion in each case. The first one is known as Lye criterion 
and the second one is known Ly yc criterion. Repeat the same exercise for the complex 
case. Some distributional aspects are examined in Mathai (1970b) and numerical tables 
are available in Mathai and Katiyar (1979b). 
6.7. Let the population be real p-variate Gaussian Np(u, X), X > О. Consider the 
hypothesis (1): 
a b b 
bab... 
5 = EM .|,a#0, be 0, аЬ. 


сыз 
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(2): 
a, bp o b dj bj «e by 
Xu Xp by ay c: by by a « bo 
X = , A = . . , = . . % 
E ad a ро Te X ш SEM 
by bj - ai bo bo +++ a 


where a, 4 0, a + 0, р 40, b2 Æ 0, а Æ bi, a Æ b2, Ур = Xi and all the 
elements in X12 апа 2», are each equal to c # 0. Construct the A-criterion in each case. 
(These are hypotheses on patterned matrices). 


6.8. Repeat Exercise 6.7 for the complex case. 


6.9. Consider k independent real p-variate Gaussian populations with different parame- 
ters, distributed as N,(M;, Xj), Xj > О, M; = aep): J = L::;5 Con- 
struct the A-criterion for testing the hypothesis 2; = --- = Xx or the covariance matrices 
are equal. Assume that simple random samples of sizes n1, ..., ng are available from these 
k populations. 


6.10. Repeat Exercise 6.9 for the complex case. 


6.11. For the second part of Exercise 6.7, which is also known as Wilks’ L5. criterion, 
2 
show that if и = A” where А is the likelihood ratio criterion and п is the sample size, then 


[S] Р 
и = E о а (i) 
[s + (p — Dsills — sı + 57 2у—1(®у — 3)7] 
where S = (sij) is the sample sum of products matrix, s = 2 Р 15И, 51 = 
3GcT 06 Si» x= ЭЗЕТ XP E ly ie For the statistic и in (i), show 


( 
that the A-th null moment or the h-th moment when the null hypothesis is true, is given by 
the following: 


вина [] PoE AAD ш 
ae | - 
ag О ens PC eS 


(i) 


Write down the conditions for the existence of the moment in (ii). [For the null and non- 
null distributions of Wilks’ Lmvc criterion, see Mathai (1978).] 


488 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


6.12. Let the (p + q) х 1 vector X have a (р + q)-variate nonsingular real Gaussian 
distribution, X ^ Npig(u, X), X > O. Let 


Уу 25 
y= 
ВЕ 


where X, is p x p with all its diagonal elements equal to саа and all other elements equal 
tO Саа’, 272 has all elements equal to сар, 275 has all diagonal elements equal to opp and all 
other elements equal to орь where саа, Саа’, Opp, оъь are unknown. Then, 27 is known as 
bipolar. Let à be the likelihood ratio criterion for testing the hypothesis that X is bipolar. 
Then show that the Л-Ш null moment is the following: 


гүне 0га неш 
Г[(р — DG + УГ — DG + 553] 


ED^|IH,] = Up = D" 0 = 0*1] 


where и is the sample size. Write down the conditions for the existence of this -th null 
moment. 


6.13. Let X be m x n real matrix having the matrix-variate Gaussian density 


1 1 


; : e ULE! (X—M)(X— MY] x > О. 
2л У |2 


РОХ) = 


Letting S = X X’, Sis anon-central Wishart matrix. Derive the density of < and show that 
this density, denoted by /f,(S), is the following: 


Љ(9 = М 5- ^31 e tr(2)—5tr(2-'S) 
Im(G)122]? 


піс 
xoil; zz% S) 


where Q = 5M M'X-! is the non-centrality parameter and 9 F is a Bessel function of 
matrix argument. 


6.14. Show that the -th moment of the determinant of S, the non-central Wishart matrix 
specified in Exercise 6.13, is given by 


Inh +3) _ nn 
Е] = —— — 37e "G2 F (R + 5 


MOL ae 
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where ; Fi is a hypergeometric function of matrix argument and © is the non-centrality 
parameter defined in Exercise 6.13. 


6.15. Letting v = Ал in Eq. (6.3.10), show that, under the null hypothesis Ho, v is 
distributed as a real scalar type-1 beta with the parameters (2, 1) and that nA ОСА is 
real scalar type-2 beta distributed with the parameters (4, 1—1). 


6.16. Show that for an arbitrary Л, the h-th null moment of the test statistic А specified in 
Eq. (6.3.10) is 


n—l nh n 
rege) DOO yyy no 
FOL push 

С» (2+5) 


E[A^|H,] = 
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Chapter 7 f) 
Rectangular Matrix-Variate Distributions м5 


7.1. Introduction 


Thus far, we have primarily been dealing with distributions involving real positive 
definite or Hermitian positive definite matrices. We have already considered rectangular 
matrices in the matrix-variate Gaussian case. In this chapter, we will examine rectangular 
matrix-variate gamma and beta distributions and also consider to some extent other types 
of distributions. We will begin with the rectangular matrix-variate real gamma distribution, 
a version of which was discussed in connection with the pathway model introduced in 
Mathai (2005). The notations will remain as previously specified. Lower-case letters such 
as x, y, z will denote real scalar variables, whether mathematical or random. Capital letters 
such as X, Y will be used for matrix-variate variables, whether square or rectangular. In 
the complex domain, a tilde will be placed above the corresponding scalar and matrix- 
variables; for instance, we will write x, y, X ; Y. Constant matrices will be denoted by 
upper-case letter such as A, B, C. A tilde will not be utilized for constant matrices except 
for stressing the point that the constant matrix is in the complex domain. When X is a 
p х p real positive definite matrix, then A < X < B will imply that the constant matrices 
A and B are positive definite, that is, A > О, В > O, and further that X > О, X—- A > 
O, B — X > O. Real positive definite matrices will be assumed to be symmetric. The 
corresponding notation for a p x p Hermitian positive definite matrix is A < X < В. 
The determinant of a square matrix A will be denoted Бу [А | or det(A) whereas, in the 
complex case, the absolute value or modulus of the determinant of A will be denoted as 
|det(A)|. When matrices are square, their order will be taken as being p x p unless specified 
otherwise. Whenever A is a real p x q, q 2 p, rectangular matrix of full rank p, AA’ is 
positive definite, a prime denoting the transpose. When A is in the complex domain, then 
A A* is Hermitian positive definite where an A* indicates the complex conjugate transpose 
of A. Note that all positive definite complex matrices are necessarily Hermitian. As well, 
dX will denote the wedge product of all differentials in the matrix X. If X = (х;;) is a 
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real p x q matrix, then dX = a ^a dx;j. Whenever X — (xij)' is a p x p real 
symmetric matrix, dX = ^Aj-»jdxj; = Aj<;dx;;, that is, the wedge product of the PPID 
distinct differentials. As for the complex matrix Х = Х 1 +iX2, i = /(—1), where Х| 
and X» are real, dX = dX; A dX». 


7.2. Rectangular Matrix-Variate Gamma Density, Real Case 


The most commonly utilized real gamma type distributions are the gamma, generalized 
gamma and Wishart in Statistics and the Maxwell-Boltzmann and Raleigh in Physics. The 
first author has previously introduced real and complex matrix-variate analogues of the 
gamma, Maxwell-Boltzmann, Raleigh and Wishart densities where the matrices are p x p 
real positive definite or Hermitian positive definite. For the generalized gamma density 
in the real scalar case, a matrix-variate analogue can be written down but the associated 
properties cannot be studied owing to the problem of making a transformation of the type 
Y = X? for 6 Æ +1; additionally, when X is real positive definite or Hermitian positive 
definite, the Jacobians will produce awkward forms that cannot be easily handled, see 
Mathai (1997) for an illustration wherein 6 = 2 and the matrix X is real and symmetric. 
Thus, we will provide extensions of the gamma, Wishart, Maxwell-Boltzmann and Raleigh 
densities to the rectangular matrix-variate cases for ё = 1, in both the real and complex 
domains. 

The Maxwell-Boltzmann and Raleigh densities are associated with numerous prob- 
lems occurring in Physics. A multivariate analogue as well as a rectangular matrix-variate 
analogue of these densities may become useful in extending the usual theories giving rise 
to these univariate densities, to multivariate and matrix-variate settings. It will be shown 
that, as was explained in Mathai (1999), this problem is also connected to the volumes 
of parallelotopes determined by p linearly independent random points in the Euclidean 
n-space, n > p. Structural decompositions of the resulting random determinants and path- 
way extensions to gamma, Wishart, Maxwell-Boltzmann and Raleigh densities will also 
be considered. 

In the current nuclear reaction-rate theory, the basic distribution being assumed for the 
relative velocity of reacting particles is the Maxwell-Boltzmann. One of the forms of this 
density for the real scalar positive variable case is 


4 з 
fils) = Fe pire, 0 xx оо, B 0, (7.2.1) 
and Ху (х) = 0 elsewhere. The Raleigh density is given by 
x2 
fa(x)- Du О < х <оо, а > 0, (7.2.2) 


a2 
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and f2 = 0 elsewhere, and the three-parameter generalized gamma density has the form 


[^4 


5 bs 


Г(%) 


В(х) = xed be x>0,b>0,a>0, 5>0, (7.2.3) 


and f3 = 0 elsewhere. Observe that (7.2.1) and (7.2.2) are special cases of (7.2.3). For 
derivations of a reaction-rate probability integral based on Maxwell-Boltzmann velocity 
density, the reader is referred to Mathai and Haubold (1988). Various basic results associ- 
ated with the Maxwell-Boltzmann distribution are provided in Barnes et al. (1982), Critch- 
field (1972), Fowler (1984), and Pais (1986), among others. The Maxwell-Boltzmann and 
Raleigh densities have been extended to the real positive definite matrix-variate and the 
real rectangular matrix-variate cases in Mathai and Princy (2017). These results will be in- 
cluded in this section, along with extensions of the gamma and Wishart densities to the real 
and complex rectangular matrix-variate cases. Extensions of the gamma and Wishart den- 
sities to the real positive definite and complex Hermitian positive definite matrix-variate 
cases have already been discussed in Chap. 5. The Jacobians that are needed and will be 
frequently utilized in our discussion are already provided in Chaps. | and 4, further details 
being available from Mathai (1997). The previously defined real matrix-variate gamma 
I,(a@) and complex matrix-variate gamma Г, р(0) functions will also be utilized in this 
chapter. 


7.2.1. Extension of the gamma density to the real rectangular matrix-variate case 


Consider a p x q, q => р, real matrix X of full rank p, whose rows are thus linearly 
independent, and a real-valued scalar function f (X X^) whose integral over X is conver- 
gent, that is, f x f(XX)dX < оо. Letting S = XX’, S will be symmetric as well as real 
positive definite meaning that for every p x 1 non-null vector Y, Y'SY > 0 for all Y 4 О 
(a non-null vector). Then, S = (s;;) will involve only pipe) differential elements, that is, 
dS = NA jai Si jə Whereas dX will contain pq differential elements dx;;’s. As has pre- 
viously been explained in Chap. 4, the connection between dX and dS can be established 
via a sequence of two or three matrix transformations. 

Let the X = (x;;) bea p xq, q = p, real matrix of rank p where the x;;’s are distinct 
real scalar variables. Let A be a p x p real positive definite constant matrix and B be a 
а X 9 real positive definite constant matrix, A? and B? denoting the respective positive 
definite square roots of the positive definite matrices A and B. We will now determine the 
value of c that satisfies the following integral equation: 


1 j 
к / |АХВХ'|Уе-Ч'(АХВ8Х)д у. (i) 
C X 
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Note that tr(AX BX’) = tr(A2 X BX'A?). Letting Y = A?2X B2, it follows from Theo- 
rem 1.7.4 that dY = |A|*|B|?d X. Thus, 


- c "hart f |YY'|'e- "Эду. (ii) 
Y 


Letting 5 = YY’, we note that S is a рхр к ositive definite matrix, and on applying 
g р Pply 


Theorem 4.2.3, we have dY = Т, d zs a "ds where /7,(-) is the real matrix-variate 


gamma function. Thus, 


x P 


= 14179181722 |sr*í-Fe-"94s д> 0, B» O, (й 


T ($) S>0 


the integral being a real matrix-variate gamma integral given by Tp (y + 2) for (у + 4) > 
P where R (-) is the real part of (-), so that 


q p 
A|2|B|? P, (2 —1 
= | . ЕШ for (у + j > = А > О, В > О. (7.2.4) 
л? ry + 7) 2 2 
Let 
ОХ) = c|AXBX'|Ye (АХВ) (7.2.5) 
for A > О, B > О, 3t - 2) > 251, Х = (xij), —00 < xj < 00, i = 1,...,р, j = 
1,...,q, where c is as specified in (7.2.4). Then, f4(X) is a statistical density that will be 


referred to as the rectangular real matrix-variate gamma density with shape parameter y 
and scale parameter matrices A > O and B > O. Although the parameters are usually 
real in a statistical density, the above conditions apply to the general complex case. 

For p=1, g=1, y=1, A = land B = f > 0, we have |AX BX'| = Bx? and 


ГАТ В] Г) PrO) _ 2/8 
n TDc-1 wir) мт 


so that c = a 8? for —oo < x « oo. Note that when the support of f(x) is restricted 
to the interval 0 < x < oo, the normalizing constant will be multiplied by 2, f(x) be- 
ing a symmetric function. Then, for this particular case, f4(X) in (7.2.5) agrees with the 
Maxwell-Boltzmann density for the real scalar positive variable x whose density is given 
in (7.2.1). Accordingly, when y — 1, (7.2.5) with c as specified in (7.2.4) will be re- 
ferred to as the real rectangular matrix-variate Maxwell-Boltzmann density. Observe that 
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for у = 0, (7.2.5) is the real rectangular matrix-variate Gaussian density that was con- 
sidered in Chap. 4. In the Raleigh case, letting р = 1, = 1, A = 1, B= as and 
у = 5, 

2 


1 
X^ A3 |x| 1 
|AXBX"|Y = ( ) = ites. 
202 J 2\a| „а 
which gives 
|х| = E ost 
fs(x) = =5е 22, — OQ « X « OO or fs(x) = se 222. O<x < oo, 
2a a 


for a > 0, and fs = 0 elsewhere where |x| denotes the absolute value of x, which is 
the real positive scalar variable case of the Raleigh density given in (7.2.2). Accordingly, 
(7.2.5) with c as specified in (7.2.4) wherein y = 1 will be called the real rectangular 
matrix-variate Raleigh density. 

From (7.2.5), which is the density for X = (х;;), p x q, q = p of rank p, with 
—00 < xij < OO, i = l,..., p, j = l,...,q, we obtain the following density for 
Y = A2X B3: 

Г) 


ар 


Iy y' Ye "0 May (7.2.6) 
л? Dy +3) 


fe(Y)dY — 


fory +4 > Р апа fg = 0 elsewhere. We will refer to (7.2.6) as the standard form of 
the real rectangular matrix-variate gamma density. The density of S = YY’ is then 


1 q_ p+! —tr(S 
fi(S)d$ = —— ———|S|"*3-7^2 e^ "gs (7.2.7) 
Ty + 5) 


fr S> 0, y+4> and f; = 0 elsewhere. 


Example 7.2.1. Specify the distribution of и = tr(A2 X BX' A3), the exponent of the 
density given in (7.2.5). 


Solution 7.2.1. Let us determine the moment generating function (mgf) of u with pa- 
rameter t. That is, 


M,(t) = E[e'"] = Eje tA? x8X'42y 


= cf AEX BX! A} |Y e-0-D1A?XBX'AÐ gy 
X 
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where c is given in (7.2.4). Let us make the following transformations: Y = A?XB 2, $ = 
YY’. Then, all factors, except Г p(y + 2), are canceled and ће mgf becomes 


Mb see 7 I+- P е 01-00699 45 
Г(у T 5) S>0 


for 1 — t > 0. On making the transformation (1 — t)S = Sı and then integrating out $1, 
we obtain the following representation of the moment generating function: 


M,(t) = (1 — t) ?**9., 1—-t>0, 


which happens to be the mgf of a real scalar gamma random variables with the parameters 
(a = p(y + ap В = 1), which owing to the uniqueness of ће mgf, is the distribution of 
и. 


Example 7.2.2. Let U; = A?XBX'A?, U) = XBX', Us = В?Х'АХВ?, U4 = 
Х' АХ. Determine the corresponding densities when they exist. 


Solution 7.2.2. Let us examine the exponent in the density (7.2.5). By making use of the 
commutative property of trace, one can write 


tr(A2X BX’ А?) = tr[A(XBX’)] = tr(B2 X' AX B2) = tr[ B(X AX)]. 


Observe that the exponent depends on the matrix AZXBX’ A}, which is symmetric and 
positive definite, and that the functional part of the density also involves its determinant. 
Thus, the structure is that of real matrix-variate gamma density; however, (7.2.5) gives the 
density of X. Hence, one has to reach U; from X and derive the density of U1. Consider 
the transformation Y = A2XBX'A?. This will bring X to Y. Now, let S = YY’ = U; 
so that the matrix U; has the real matrix-variate gamma distribution specified in (7.2.7), 
that is, U; is a real matrix-variate gamma variable with shape parameter у + 4 and scale 
parameter matrix /. Next, consider U2. Let us obtain the density of U2 from the density 
(7.2.5) for X. Proceeding as above while ignoring A or taking A = 7, (7.2.7) will become 
the following density, denoted by fu, (U2): 

44 


|A|” 2 р+1 


4 
|U2|” +27 7 e HAL) ay, | 
Ty + 2) 


fu, (U2)dU2 = 


which shows that U» is a real matrix-variate gamma variable with shape parameter у + 4 
and scale parameter matrix A. With respect to U3 and U4, when q > p, one has the positive 
semi-definite factor X’BX whose determinant is zero; hence, in this singular case, the 
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densities do not exist for U3 and U4. However, when q = p, U3 has a real matrix-variate 
gamma distribution with shape parameter y 4- E and scale parameter matrix / and U4 has 
a real matrix-variate gamma distribution with shape parameter y + 5 and scale parameter 
matrix B, observing that when q = p both U3 and U4 are д x q and positive definite. This 
completes the solution. 


The above findings are stated as a theorem: 


Theorem 7.2.1. Let X = (xij) be a real full rank p x q matrix, q = p, having the 
density specified in (7.2.5). Let U1, U2, U3 and U4 be as defined in Example 7.2.2. Then, 
U, is real matrix-variate gamma variable with scale parameter matrix I and shape pa- 
rameter y + 2; U» is real matrix-variate gamma variable with shape parameter y + 2 
and scale parameter matrix A; Us and U4 are singular and do not have densities when 
q > p; however, and when q = p, Us is real matrix-variate gamma distributed with 
shape parameter y + E and scale parameter matrix І, and U4 is real matrix-variate 
gamma distributed with shape parameter y 4- Б and scale parameter matrix В. Further 


|I, — A3XBXA?| = |1, — B2X'AXB?]. 


Proof: All the results, except the last one, were obtained in Solution 7.2.2. Hence, we 
shall only consider the last part of the theorem. Observe that when д > p, |A2X ВХ'А? | 
> 0, the matrix being positive definite, whereas |B2 X' AX B?| = 0, the matrix being 
positive semi-definite. The equality is established by noting that in accordance with results 
previously stated in Sect. 1.3, the determinant of the following partitioned matrix has two 
representations: 


ГА AIX B2| _ pl Us - (BEX'A} 11; (A2XB3)| = |1, — B2X' АХВ? 
[ql ip — (A2X B5)1; (B3 X A3)| = [Ip — ASX BX'AÀ| 


7.2.2. Multivariate gamma and Maxwell-Boltzmann densities, real case 


Multivariate usually means a collection of scalar variables, real or complex. Many real 
scalar variable cases corresponding to (7.2.1) or a multivariate analogue of thereof can be 
obtained from (7.2.5) by taking p = 1 and A = Р > 0. Note that in this case, X is 1 x q, 
that is, X = (x1, ..., Xq), and XBX ' is a positive definite quadratic form of the type 


X1 
ХВХ' = (х1,...,х,)В 


Ха 
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Thus, the density appearing in (7.2.5) becomes 
prar 
n$D(y + 2) 


fs(X)dX = [XBX eee ax (7.2.8) 


for X = (x1, ..., x4), —00 < ху < œ, } =1,...,д, B = В > O, b» 0, and fg = 0 
elsewhere. Then, the density of Y = В? X' is given by 


Г(3) 


fo(Y)dY = pr*$ — —3-——— 
л?Г(у +5) 


О yar etOi Day (7.2.9) 


where Y' = (y1, ..., Yg), — œ < yj < œ, j = 1,...,q, b > 0, y+4 > 0, and 
fo = 0 elsewhere. We will take (7.2.8) as the multivariate gamma as well as multivariate 
Maxwell-Boltzmann density, and (7.2.9) as the standard multivariate gamma as well as 
standard multivariate Maxwell-Boltzmann density. 

How can we show that (7.2.9) is a statistical density? One way consists of writing 
fo(Y)dY as fo(S)dS, applying Theorem 4.2.3 of Chap. 4 and writing dY in terms of dS 
for p = 1. This will yield the result. Another way is to integrate out variables yj, ..., yg 
from fo(Y)dY, which can be achieved via a general polar coordinate transformation such 
as the following: Consider the variables y1, ..., yj, — 00 < yj < со, Ј = 1,..., 9, and 
the transformation, 


yy =r sinh; 
yj =r cos cos @---cos@j;_;sin6;, j =2,3,...,q—1, 
Уд = r COS 0] COS 02 --- cos Og_1, 


for m «0j; < 25 j—bh....q—2; —z < 64-1 € л, which was discussed in Mathai 
(1997). Its Jacobian is then given by 


4—1 
dyi A... Adyg 2 rf A [ | [еовд,[ 7—11 dr ^ dO) A... ла, 1. (7.2.10) 
j=l 


Under this transformation, y? +--+ ye = r?. Hence, integrating over r, we have 
GTP 1 —(y+4) q q 
| (rOYr* e" dr = -YTT (у +=), y+=>0. (7.2.11) 


Note that the 0;'s are present only in the Jacobian elements. There are formulae giving the 
integral over each differential element. We will integrate the 0;'s one by one. Integrating 
over 0; gives 
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m л. 1 
2 2 
(cos 09)! 7d0; = 2 А (cos 01) 210 = 2 | 297201 —4?) 2dz 
0 0 


т 
гг) 
= гау 4 > 1. 


The integrals over 02, 03, ..., 9g—2 can be similarly evaluated as 


rar) rears) ra) 


ra ' r& г) 


for g > p—1, the last integral f 7 т d@q-1 giving 27r. On taking the product, several gamma 
functions cancel out, leaving 
2л[Г(%)]147?° — 2л?% 
Г(%) rE 


(7.2.12) 


It follows from (7.2.11) and (7.2.12) that (7.2.9) is indeed a density which will be referred 
to as the standard real multivariate gamma or standard real Maxwell-Boltzmann density. 


Example 7.2.3. Write down the densities specified in (7.2.8) and (7.2.9) explicitly if 


3 cp 0 
В=|—1 2 1 |,b=2andy=2. 
O 1 1 


Solution 7.2.3. Let us evaluate the normalizing constant in (7.2.8). Since in this case, 

|B| = 2, 

"- erui _ 2 ArG) NE. E 
n?D(yct2)  m?r(2+4) 15r? 

The normalizing constant in (7.2.9) which will be denoted by co, is the same as cg exclud- 

ing |B|? = 22. Thus, 


(0) 


с) = xs (ii) 
15z2 

Note that for X = [x], x2, xs], XBX’ = 3x? + 2х. + xs — 2х1х2 + 2x2x3 and YY’ = 

y? F ys + y3. Hence the densities fg(X) and fo(Y) are the following, where cg and co are 


given in (i) and (ii): 


2 219.212 
fa(X) = eg [24 + 2x3 + x3 — 2xixj + 2x25 | е—213х1--2х5--х$ —2х1х2+-2х2х3] 
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for оо < xj < oo, j = 1, 2, 3, and 


2 
S eds E iy | e DHL for — оо < yj < oo, ј = 1,2,3. 

This completes ће computations. 

7.2.3. Some properties of the rectangular matrix-variate gamma density 


For the real rectangular matrix-variate gamma and Maxwell-Bolztmann distribution 
whose density is specified in (7.2.5), what might be the h-th moment of the determinant 
[AX B X'| for an arbitrary h? This statistical quantity can be evaluated by looking at the 
normalizing constant c given in (7.2.4) since the integrand used to evaluate E[|A X BX’|]", 
where Е denotes the expected value, is nothing but the density of X wherein y is replaced 
by y + h. Hence we have 


| Dy ++) 


E[|AXBX"|]"] = E pe 


PO D ; UL S —y = 2 + = (7.2.13) 


In many calculations involving the Maxwell-Boltzmann density for the real scalar variable 
case x, one has to integrate a function of x, say v(x), over the Maxwell-Boltzmann density, 
as can be seen for example in equations (4.1) and (4.2) of Mathai and Haubold (1988) in 
connection with a certain reaction-rate probability integral. Thus, the expression appearing 
in (7.2.13) corresponds to the integral of a power function over the Maxwell-Boltzmann 
density. 

This arbitrary h-th moment expression also reveals an interesting point. By expanding 
the matrix-variate gamma functions, we have the ed 


Dy +4+h Г(у+2-– 151 + Р 
pY 27 ? .IT (Y +h) =|] £c? 
Pvt? ja Ту езу ш 
where f; is a real scalar gamma random variable with parameter (у + 2 — = 1), j= 
1,..., p, whose density is 
1 таас не ee а Che 
got) = 7 Gop pe 2 9 yO, үсе, (72.4) 
ry +$- 932) 
and zero elsewhere. Thus structurally, 
|АХВХ'| = һр ·· tp (7.2.15) 
where 1j, ..., ty are independently distributed real scalar gamma random variables with t; 


having the gamma density given in (7.2.14) for j = 1,..., p. 
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7.2.4. Connection to the volume of a random parallelotope 


First, observe that |AX BX/| = |(A2XB2)(A2X B2y'| = |UU"| where U = A? X B3. 
Then, note that U is p x 4, q > p, and of full rank p, and that the p linearly independent 
rows of U, taken in the order, will then create a convex hull and a parallelotope in the 
q-dimensional Euclidean space. The p rows of U represent p linearly independent vectors 
in the Euclidean q-space as well as p points in the same space. In light of (7.2.14), these 
random points are gamma distributed, that is, the joint density of the p vectors or the p ran- 
dom points is the real rectangular matrix-variate density given in (7.2.5), and the volume 
content of the parallelotope created by these p random points is | AX B X'| 2. Accordingly, 
(7.2.13) represents the (2/)-th moment of the random volume of ће p-parallelotope gener- 
ated by the p linearly independent rows of A 2 X B2. The geometrical probability problems 
considered in the literature usually pertain to random volumes generated by independently 
distributed isotropic random points, isotropic meaning that their associated density is in- 
variant with respect to orthonormal transformations or rotations of the coordinate axes. For 
instance, the density given in (7.2.9) constitutes an example of isotropic form. The distri- 
butions of random geometrical configurations is further discussed in Chap. 4 of Mathai 
(1999). 


7.2.5. Pathway to real matrix-variate gamma and Maxwell-Boltzmann densities 


Consider a model of the following form for a p х q, 9 2 p, matrix X of full rank p: 
fio(X) = ciglAX BX'|Y|] — a(1 — à)A?2 X BX'A3|T2, а < 1, (7.2.16) 


for A > O, B > О, а> 0, п> 0, I—a(1 – а)А?ХВХ'А? > О (positive definite), 
and fi9(X) = 0 elsewhere. It will be determined later that the parameter у is subject to 
the condition y + 4 > =? When o > 1, we let 1 — а = —(a — 1), о > 1, so that the 
model specified in (7.2.16) shifts to the model 


f(X) = culAXBX'|Y|I + а(о — DA2XBX'A2| 3, a> 1 (7.2.17) 


forn > 0, а> 0, A> О, B > О, апі fi (X) = 0 elsewhere. Observe that A? X BX’A2 


is symmetric as well as positive оше when ra is of full rank p and A > O, B > 


O. For this model, the condition =17 -y-t > pot is required in addition to that 


applying to the parameter y in (7.2.16). Note that when fjo(X) and fij(X) are taken as 
statistical densities, cjg and c11 are the associated normalizing constants. Proceeding as in 
the evaluation of c in (7.2.4), we obtain the following representations for суо and c11: 
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1 
D) Гу t ++ ^) 


q р 4 
cio = [AD BI? [a(1 — a) PD —72- е _ (7.2.18) 
т? Tay +D + 251) 
огу > 0, а < 1, а> 0, А > О, В > О, y t$», 
а Ё 4 
ти А[218[2 [або — 1) o* P r,($) DG) bus 
л? Гру + Гь - y – 2) 
ога > 1, у> 0, а> 0, А > О, В > О, y t$», т-у 4 > pol When 


a — l. in (7.2.18) anda — 14 in (7.2.19), the models (7.2.16) and (7.2.17) converge 
to the real rectangular matrix-variate gamma or Maxwell-Boltzmann density specified in 
(7.2.5). This can be established by applying the following lemmas. 


Lemma 7.2.1. 
lim |7 —a( — о)АЗХВХ'А? |та = e emrAXBX) 
a> 1. 
and ; TP 
lim |Z +a(a@ — 1)A2XBX'A3| ат = e *1(AXBX) (7.2.20) 
a> ly 
Proof: Letting À1,..., Ар be the eigenvalues of the symmetric matrix A?2XBX' A, we 
have 


p 
|I — a(l 2) A3 XBX' A3|Fs = [ [t - aa – олт. 
j=l 
However, since 
lim [1 —а(1— о)А 9 = ета, 
а 1 

the product gives ће sum of the eigenvalues, that is, tr(AZX ВХ' А?) in the exponent, 
hence the result. The same result can be similarly obtained for the case œ > 1. We can 
also show that the normalizing constants сто and су reduce to the normalizing constant 
in (7.2.4). This can be achieved by making use of an asymptotic expansion of gamma 
functions, namely, 


T(z +8) ~ ул 218—2 e™ for |z| > оо, 8 bounded. (7.2.21) 


This first term approximation is also known as Stirling's formula. 
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Lemma 7.2.2. 


Dy  $- т +257) 


q q 
lim [a(1 — o)]^**? = (а)Р +) 
EE Ty Gs + FY) 
and 
i (у+3) Dy) (+3) 
lim [a(a — POHD — ——2— = (qn) PYF), (7.2.22) 
aly Dy mf dm 2) 


Proof: On expanding Г (-) using its definition, for a > 1, we have 


[ао — 1)]Р +9 Г (Л) 


0—1 


TyG - y - 9 


р 
= [a(a — ppoe*9 I] 


j=l 


- 
FG -v -$- 555 


Now, on applying the Stirling's formula as given in (7.2.21) to each of the gamma functions 


by taking z — = — oo when a — 14, it is seen that the right-hand side of the above 


equality reduces to (an)? (7+5), The result can be similarly established for the case o — 1. 

This shows that суо and c11 of (7.2.18) and (7.2.19) converge to the normalizing con- 
stant in (7.2.4). This means that the models specified in (7.2.16), (7.2.17), and (7.2.5) are 
all available from either (7.2.16) or (7.2.17) via the pathway parameter o. Accordingly, 
the combined model, either (7.2.16) or (7.2.17), is referred to as the pathway generalized 
real rectangular matrix-variate gamma density. The Maxwell-Boltzmann case corresponds 
to y = 1 and the Raleigh case, to у = 1. If either of the Maxwell-Boltzmann or Raleigh 
densities is the ideal or stable density in a physical system, then these stable densities as 
well as the unstable neighborhoods, described through the pathway parameter о < 1 and 
a > l,and the transitional stages, are given by (7.2.16) or (7.2.17). The original pathway 
model was introduced in Mathai (2005). 

For addressing other problems occurring in physical situations, one may have to in- 
tegrate functions of X over the densities (7.2.16), (7.2.17) or (7.2.5). Consequently, we 
will evaluate an arbitrary h-th moment of |A X BX’| in the models (7.2.16) and (7.2.17). 
For example, let us determine the h-th moment of |AX B X'| with respect to the model 
specified in (7.2.16): 


E[|AXBX"|"] = zy |AX BX'|Y*^ |] — a(1 — à)A? X BX' A3 |7 dX. 
X 
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Note that the only change in the integrand, as compared to (7.2.16), is that y is replaced 
by y +h. Hence the result is available from the normalizing constant сто, and the answer 
is the following: 


D fh) Гру ++ +”) 


E{|AXBX’|"] = [a(1 — o)]- ^ (7.2.23) 
ГУ +4) Dru] Ri.) 
for R(y + $ +h) > =, а > 0, a < 1. Therefore 
Е[а(1 — o)AXBX'|^] 
о t hth) Tytti + #7) 
Гу +4) Гу ++ + + А) 
+1 ]—1 
-H ri *t$-56 +h) Iq ee з EI, 
gail Do *$-/9 To cie dE eth 
р 
- [I2 (>?) (7.2.24) 


where у; ) is а real scalar type-1 beta random variable with the parameters (у + 4 — 


j 2h Eu 2i] j = 1,..., p, the yj's being mutually independently distributed. 


ence we have the structural relationship 
la(1 — &)AXBX'| = y1 ++- yp. (7.2.25) 
Proceeding the same way for the model (7.2.17), we have 


Ty t+$4+h TGr- 5 – №) 


E[|AXBX"|"] = =)?" 
[аха - lale Г еур 


(7.2.26) 


> P, RH -y -i -h > Ё ог (у + 4) + 25 < (А) < 


for 9 (у ay h) > 
— irn 


2 аА 1 
Шз == са аьаа 


Е[а(а — DAXBX'] = q4 j=l П q_i! 
Lye =) Ma ye) 


= Е (2") (7.2.27) 
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where z; is a real scalar type-2 beta random variable with the parameters (y + 4 FS 
B сте = ij) for j = 1,..., р, the zj's being mutually independently 


distributed. Thus, for о > 1, we have the structural representation 
lata — DAXBX'|2z-- Zp. (7.2.28) 


As previously explained, one can consider the p linearly independent rows of АХВ? 
as р vectors in ће Euclidean g-space. Then, these р vectors are jointly distributed as 
rectangular matrix-variate type-2 beta, and E[|AX BX’|"] = ЕЦАХВХ'|2]2" is the (2h)- 
th moment of the volume of the random parallelotope generated by these р q-vectors for 
q > р. In this case, the random points will be called type-2 beta distributed random points. 

The real Maxwell-Boltzmann case will correspond у = 1 and the Raleigh case, to 
у= І, and all the above extensions and properties will apply to both of these distributions. 


7.2.6. Multivariate gamma and Maxwell-Boltzmann densities, pathway model 


Consider the density given in (7.2.16) for the case р = 1. In this instance, the p x p 
constant matrix A is | x 1 and we shall let A = b > О, a positive real scalar quantity. 
Then for a < 1, (7.2.16) reduces to the following where X is 1 x q of the form X = 
(х1,...,җ), — OO < xj < œ, J = Los: 


ray Г(у+2 + 43; +1) 
л Г(у+®Г( 1 +1) 
x [DXBX']'[1 — а(1 — o)bX BX'] = (7.2.29) 


f(X) = БВ |2[а(1 — o] tD 


for b > 0, B = B' > 0,aS 0 go Oy ese), =o =o < œ, j= 
1,...,9, 1—a(1 —a)bX BX’ > 0, a < 1, and fi? = О elsewhere. Note that 


^1 
XBX' = (x1,...,Xq)B 
Xq 
is a real quadratic form whose associated matrix B is positive definite. Letting the 1 x q 
vector Y = X B2, the density of Y when о < 1 is given by 
PO Г(у+% ++ +1) 
лї? Г(у+%Г( +1) 
_ 
x [02 +++ xD -al — abo? +--+ yD]rsdy, (72.30) 


fis (Y) dY = b'*$[aü = o)o*» 
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forb>0, y+4%>0, 7 > 0, – оо < уу < оо, ј = 1,...,9, 1 — a(l — о)6(у? + 
‚+ yg) > 0, and 1з = 0 elsewhere, which will be taken as the standard form of the 
real multivariate gamma density in its pathway generalized form, and for y = 1, it will 
be the real pathway generalized form of the Maxwell-Boltzmann density in the standard 
multivariate case. For œ > 1, the corresponding standard form of the real multivariate 
gamma and Maxwell-Boltzmann densities is given by 


r$) Du 
gt Dg eyes) 


0—1 


x [02 - ----- У) + або – 0)6(у2 Dp та. (7.2.31) 


fia(Y)dY = b’+3[a(a Е pre? 


forb>0, y+%>0,a>0,n>0, = т-у-1 > 0, – оо < уу < оо, jcl....q, 
апа fj4 = О elsewhere. This will be taken as the pathway generalized real multivariate 
gamma density fora > 1, and for y = 1, it will be the standard form of the real pathway 
extended Maxwell-Boltzmann density for о > 1. Note that when о — 1_ in (7.2.30) and 
а — l, in (7.2.31), we have 


prs (y+) г 
fis(Y)dY = Б 2 (an) 2 aa 
л?Г(у +5) 
x [f кеу e "oi egy, (7.2.32) 


forb>0,a>0, n>0, у +4 > 0, and f15 = 0 elsewhere, which for y = 1, is the real 
multivariate Maxwell-Bolzmann density in the standard form. From (7.2.30), (7.2.31), and 
thereby from (7.2.32), one can obtain the density of u = y? ++ ir either by using the 
general polar coordinate transformation or the transformation of variables technique, that 
is, going from dY to dS with 5 = YY’, Y being 1 x p. Then, the density of u for the case 
a < lis 


Г(у + 4-1-4 1) q n 
= bY+$[a(1 —ayr*$ 2 a i Ytan a Ta 
fisu) = bz [ad —a@)] Py + Dr D [1 —a(1—a)bu]**, a <1, 
(7.2.33) 
forb>0,a>0,n>0, a < 1, y+4 -0,1-—a(1 — a)bu > 0, and fig = 0 
elsewhere, the density of и for a > 1 being 


PO) 


иу Nal — Dbu] €, о > 1, 
Г(у+®Г( + —у—3) 


fiu) = bY*$[a(y —1)]*$ 


(7.2.34) 
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forb > 0, а> 0, п> 0, y+% > 0, т-у {> 0, и> 0, ап fi; = 0 elsewhere. 


Observe that as œ — 1, both (7.2.33) апа (7.2.34) converge to the form 


y+% 
_ (anb) 2 uY * $-Mg-abnu 


= (7.2.35) 
Г(у + 2) 


fig(u) 


fora > 0, b > 0, n > 0, u > 0, and fig = О elsewhere. For у = І, we have the 
corresponding Raleigh cases. 
Letting у = 1 and q = 1 in (7.2.32), we have 


3 


3 3 
р? p m2 262 р 
fie) = fo e Бу = Fre byi, -œ < yı < ow, b>0 
YT? 
3 
4р2 ру 
= —— yje 71, 0 < у < œ, Б> 0, (7.2.36) 
VT 


and fijo = 0 elsewhere. This is the real Maxwell-Boltzmann case. For the Raleigh case, 
we let y = 1 and р = 1, q = 1 in (7.2.32), which results in the following density: 


1 2 
foo(y1) = b(y)3e 1, — оо < у «oo, D> 0 


= 2b|yi| e Pt, О < уу «oo, b > 0, (7.2.37) 


and foo = 0 elsewhere. 
7.2.7. Concluding remarks 


There exist natural phenomena that are suspected to involve an underlying distribution 
which is not Maxwell-Boltzman but may be some deviation therefrom. In such instances, 
it is preferable to model the collected data by means of the pathway extended model pre- 
viously specified for p — 1, q — 1 (real scalar case), p — 1 (real multivariate case) 
and the general matrix-variate case. The pathway parameter o will capture the Maxwell- 
Boltzmann case, the neighboring models described by the pathway model for œ < 1 and 
for o > 1 and the transitional stages when moving from one family of functions to another, 
and thus, to all three different families of functions. Incidentally, for y = 0, one has the 
rectangular matrix-variate Gaussian density given in (7.2.5) and its pathway extension in 
(7.2.16) and (7.2.17) or the general extensions in the standard forms in (7.2.30), (7.2.31), 
and (7.2.32) wherein y = 0. The structures in (7.2.24), (7.2.27), and (7.2.28) suggest that 
the corresponding densities can also be written in terms of G- and H-functions. For the 
theory and applications of the G- and H-functions, the reader is referred to Mathai (1993) 
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and Mathai et al. (2010), respectively. The complex analogues of some matrix-variate dis- 
tributions, including the matrix-variate Gaussian, were introduced in Mathai and Provost 
(2006). Certain bivariate distributions are discussed in Balakrishnan and Lai (2009) and 
some general method of generating real multivariate distributions are presented in Mar- 
shall and Olkin (1967). 


Example 7.2.4. Let X = (xij) be areal p xq, q = p, matrix of rank p, where the x;;’s 
are distinct real scalar variables. Let the constant matrices A = b > Obe 1 x land B > O 
be q x q. Consider the following generalized multivariate Maxwell-Boltzmann density 


РОХ) =c|AXBX"|” e ltr(AX BX)? 
ford > 0, A=b>0, X = [xi, ..., x4]. Evaluate c if f (X) is a density. 


Solution 7.2.4. Since X is 1 xq, |AX BX'| = b[X BX'] where X BX’ is a real quadratic 
form. For f (X) to be a density, we must have 


(= / f(X)dX =c bY / [ХВХ/]/ е-!ФХВХЭ ay. (i) 
X X 


Let us make the transformations Y — X B? and s = YY’. Then (i) reduces to the following: 


q 

2 оо 

l=cb’ malar f sY*$-1g- Os) ds, (ii) 
2 0 


1—1 
Letting t = Ьу, b 20, s > 0 => йз = x de, (ii) becomes 


Hence, 
4 ï 
pes 8 b2 r (4)|B|2 


= — Р 
"irj + $) 

No additional conditions are required other than у > 0, 6 > 0, д > 0, B > О. This 

completes the solution. 
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7.2a. Complex Matrix-Variate Gamma and Maxwell-Boltzmann Densities 


The matrix-variate gamma density in the real positive definite matrix case was defined 
in equation (5.2.4) of Sect. 5.2. The corresponding matrix-variate gamma density in the 
complex domain was given in Eq. (5.2a.4). Those distributions will be extended to the 
rectangular matrix-variate cases in this section. A particular case of the rectangular matrix- 
variate gamma in the complex domain will be called the Maxwell-Boltzmann density in 
the complex matrix-variate case. Let X = (X у) bea p xq, q = p, rectangular matrix of 
rank p in the complex domain whose elements x;; are distinct scalar complex variables. 
Let |det(-)| denote the absolute value of the determinant of (-). Let A of order p x p and 
B of order q x q be real positive definite or Hermitian positive definite constant matrices. 
The conjugate transpose of X will be denoted by X*. Consider the function: 


f )ах = č |det(AŠ BX*)|'e "AXBXDax (7.2a.1) 


for A > О, В > О, (y +q) > p—1 where č is the normalizing constant so that F(X) 
is a statistical density. One can evaluate c by proceeding as was done in the real case. Let 


Y = A? XB? = d¥ = |det(A)|!|det(B)|^dX, 


the Jacobian of this matrix transformation being provided in Chap. 1 or Mathai (1997). 
Then, f (X) becomes 


fi (Y) d¥ = č |det(A)|"* |det(B)| ? |det(Y Y*)|Ye "a Pay, (1243) 


Now, letting 
z "e A qp E T 
$=ўў* > af = I— |де)? 
pq 


by applying Result 4.2a.3 where Г p(q) is the complex matrix-variate gamma function, fi 
changes to 


fo(S) dS = é|det(A)|~4|det(B)|~? ы 
I»(q) 


x |det(S)|*4-Pe- tas. (7.24.3) 
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Finally, integrating out S by making use of a complex matrix-variate gamma integral, we 
have T, (y +q) ог (у +q) > p — 1. Hence, the normalizing constant с is the following: 


|det(A)|?]det(B)|? — P, (q) 
TAIP Py +4) 


с= , Ry +9) > р-1, А> О, В> О. (72a4) 


Example 7.2a.1. Evaluate the normalizing constant in the density in (7.24.1) if у = 
2, q4 =3, p=2, 


| y oh a 
А |, 5 0 2 1+ 


ke. x =i 1-i 2 
Solution 7.2a.1. Note that A and B are both Hermitian matrices since A — A* and 
B = B*. The leading minors of A are |(3)| = 3 > 0, 1; | = (3)2)—-(1-+ 
i)(1 — i) = 4 > 0 and hence, A > О (positive definite). The leading minors of В 
= 30 _ _ 2 1+i ‚| 0 2 p. 
are |(3)| = 3 > 9, 027979 В| = 34 j ) кон 1; 


3(4 — 2) + i (2i) = 4 > 0. Hence В > О and |B| = 4. The normalizing constant 


(де!(А)|Ф|4е (В)? — P (q) 
mPa Dy +4) 
_ (Q4 5G) _ 4лГ@)ГО) 
лб DS л©лГ(5)Г(4) 
27 
zm 


с = 


This completes ће computations. 


7.2a.1. Extension of the Matrix-Variate Gamma Density in the Complex Domain 


Consider the density of X is given in (7.2a.1) with € given in (7.2a.4). The density of 
Y = A2XB? is given by 


- Idet(Y Y*)|7 e- (C Y? (7.2a.5) 
n 4P ['5(y +4) 


for R(y +q) > p — 1, and fi = 0 elsewhere, and the density of S=YY*is 
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T 1 " E 
A) = £— ——— |det(S)|"*47Pe "D. (у + д) > p— 1, (7.2a.6) 
Де; +4 


апа ГА = 0 elsewhere. Then, the density given in (7.24.1), namely, f (X ) for y = 1 will 
be called the Maxwell-Boltzmann density for the complex rectangular matrix-variate case 
since for p = 1 and q = 1 in the real scalar case, the density corresponds to the case 
y = 1, and (7.2a.1) for y = 1 will be called complex rectangular matrix-variate Raleigh 
density. 


7.2a.2. The multivariate gamma density in the complex matrix-variate case 


Consider the case p = 1 and A = b > 0 where b is a real positive scalar as the p x p 
matrix A is assumed to be Hermitian positive definite. Then, X is 1 x q and 


AXBX* = bXBX* = Ь(Х\,..., X,)B 


is a positive definite Hermitian form, an asterisk denoting only the conjugate when the 
elements are scalar quantities. Thus, when p = 1 and A = b > 0, the density f (X) 
reduces to 


Г(а) 


БОХ) = Ь/+4|йег(В)|———|де(% B X*)|/ e72% 89 (72a.7) 
л“4Г(у +q) 


for X — Xie tg) B = B > O, b > 0, R(y + q) > 0, and f — 0 elsewhere. 
Letting Y* — Bix *. the density of Y is the following: 


pY*4 P (q) 


DFP 4s КИШ POP (7.22.8) 
n4 (y +q) ш 


ЛО) = 
for b > 0,9i(y +q) > 0, and fa = 0 elsewhere, where |y;| is the absolute value or 
modulus of the complex quantity y;. We will take (7.2a.8) as the complex multivariate 
gamma density; when y = 1, it will be referred to as the complex multivariate Maxwell- 
Boltzmann density, and when y — І, it will be called complex multivariate Raleigh den- 
sity. These densities are believed to be new. 

Let us verify by integration that (7.2a.8) is indeed a density. First, consider the trans- 
formation s = YY*. In view of Theorem 4.2a.3, the integral over the Stiefel mani- 


fold gives dř = Ao" -ld$, so that fo is canceled. Then, the integral over 5 yields 
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Ь-©'+4) (у +q), 8(y +q) > 0, and hence it is verified that (7.2a.8) is a statistical 
density. 


7.2a.3. Arbitrary moments, complex case 


Let us determine the h-th moment of и = Idet(A X BX *)| for an arbitrary h, that is, 
the A-th moment of the absolute value of the determinant of the matrix AX B X* or its 
symmetric form A? X B X* A2, which is 


еф з LP а 
E[|det(AX BX*)|" = г | |det(AX B X)|*Ye "4° XBX А?) ах, (7.2a.9) 
X 


Observe that the only change, as compared to the total integral, is that y is replaced by 
y +h, so that the h-th moment is available from the normalizing constant с. Accordingly, 


Py tq +h) 
D +4) 


Жү! е ыш 
ux нае) 


= E(u})E(u5)-+- E(u’) (7.2a.11) 


E[u^] = Ry t¢qth) > p-i, (7.2a.10) 


where the и ;'s are independently distributed real scalar gamma random variables with pa- 
rameters (y +q — (j — D, 1), j = L..., p. Thus, structurally и = Idet(A X B X*)| is 
a product of independently distributed real scalar gamma random variables with param- 
eters (у +q — (jJ — 1), 1), j = 1,..., p. The corresponding result in the real case is 
that | AX B X'| is structurally a product of independently distributed real gamma random 
variables with parameters (y + 4 — с 1), j = 1,..., р, which can be seen from 
(7.2.15). 


7.2a.4. A pathway extension in the complex case 


A pathway extension is also possible in the complex case. The results and properties 
are parallel to those obtained in the real case. Hence, we will only mention the pathway 
extended density. Consider the following density: 


БО) = &jdet(A2 X BX* A3)||det(/ —a(1 —o0)A2XBX"A23)|-« (724.12) 
fora > 0, а < 1, I —a(1 – à)A2 XBX* A2 > О (positive definite), А > О, B > 


О, п> 0, у +q) > р – 1, апа fs = 0 elsewhere. Observe that (7.2a.12) remains іп 
the generalized type-1 beta family of functions for o < 1 (type-1 and type-2 beta densities 
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in the complex rectangular matrix-variate cases will be considered in the next sections). If 
a > 1, then on writing 1 — a = – (о — 1), a > 1, the model in (7.2a.12) shifts to the 
model 


fe (X) = &ldet(A2 X BX* A2)|" |det(I + a(a — 1)A2 XBX*A2)| «1 —— (72a.13) 


fora > 0, а > 1, А > О, В > О, A2XBX*A? > O, n>0, (ред) 
р- 1, 98(y +q) > p—1, and fo = 0 elsewhere, where c? is the normalizing constant, 
different from cj. When a — 1, both models (7.2a.12) and (7.2a.13) converge to the 
model f7 where 


1 2 "m 1 
fo (X) = &|det(A2 X BX* A2)|Y e? 1 (A2 XBX*A2) (7.2a.14) 


fora>0,7n>0, А> О, В > О, (у +9) > р - 1, A2XBX*A? > О, and f; = 0 
elsewhere. The normalizing constants сап be evaluated by following steps parallel to those 
used in the real case. They respectively are: 


Г„(4) 0 +а+т + р) 


су = |det(A)|"|det(B)|"[a(1 — о)]”0 9) E - (7.2a.15) 
л Dy - qM p) 
frn > 0, а> 0, а < 1, A>O, B>O, R(y+q)>p-1; 
Č 10) 
25 = |det AI! IdetCB)l?[a(o — 1*0» 0? P a1 (724.16) 


n? Py -abG-y-q) 


fora > 0, а> 1, п> 0, А> О, B>O, W(yctq) > р-1, WZ2H—-y-q) > р-1; 
Г,09) 
ла Гу (у +4) 

fora >0, n>0, А > О, В > О, R(yv+q)>p-i. 


ёз = (aņ)P + |det(A)|? |det(B)|? (7.2a.17) 


7.2а.5. The Maxwell-Boltzmann and Raleigh cases in the complex domain 


The complex counterparts of the Maxwell-Boltzmann and Raleigh cases may not be 
available in the literature. Their densities can be derived from (7.2a.8). Letting р = 1 and 
q = 1 in (7.2а.8), we have 


M pvt! К РҮ | | 
fO) = xr pue PA. $5 = yn +іур, i = /(—1), 
y 
pvt! 
= xr ps + yhe Oht) (7.2a.18) 
лГ(у + 
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for b > 0, R +1) > 0, — оо < уру < oo, j = 1,2, and fs — 0 elsewhere. We 
may take y — 1 as the Maxwell-Boltzmann case and y — 1 as the Raleigh case. Then, for 
y = 1, we have 


pass Ped epp a i 
fo) = publ e ‚ўї = yu + ѓу 


огр > 0, — оо < уң; < oo, j = 1,2, and ГА = 0 elsewhere. Note that in the 


real case yj? = 0 so that the functional part of fs becomes yhe, — ОСО < уру < œ. 
However, the normalizing constants in the real and complex cases are evaluated in different 
domains. Observe that, corresponding to (7.2a.18), the normalizing constant in the real 
case is bY +2 / Ur ir (y+ 1)]. Thus, the normalizing constant has to be evaluated separately. 
Consider the integral 


oo oo oo 
hya Shays d = л 
J уйе idy =2 f уйе idy =f u3 le Pqy = ут 
—oo 0 0 


2b? 
Hence, 
2b? 
2 
fioi) = y e, -œ < ур < œ, Б> 0, 
ET 
Ab? 
= 2 —by? 
= — ye 7", O< ур < 00, D> 0, (7.2a.19) 
NES 


and fio = 0 elsewhere. This is the real Maxwell-Boltzmann case. For the Raleigh case, 
letting y — 5 in (7.2a.18) yields 


3 


~ 2 1 m 2: 
fuGD- [512]2e 01D, b > 0, 
mr) 
3 
2b2 
= Tp + уе POND), —00 < уру < оо, j= 1,2, b>0, 
IU 2 


(7.2a.20) 


and fil = 0 elsewhere. Then, for yj? = 0, the functional part of fü is |yq1| ebi with 
—0o < уу < оо. The integral over у gives 


ә —by? i —by? -1 
уше нау = 2 yue "у= р, 
0 


=p0 
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Thus, in the Raleigh case, 


ШЕ Ж. 
fixi) = руше, — оо < ур < оо, р> 0, 


=a byre i 0 < у < oo, Б> 0, (7.24.21) 


and fi? = 0 elsewhere. The normalizing constant in (7.2a.18) can be verified by making 
use of the polar coordinate transformation: yj; = rcos0, yj? = r sin0, so that дуц A 
ау =r dr ^A d0, 0 < 0 < 2л, 0 <r < œ. Then, 


m i 2 2 -( 2 2) оо 5 КЕ: 
J | [у le "ur P dygp Айуу = an f (r^)! e^" rdr 
M NS А 
= лЬ-+0 Py +1) 
for b > 0, 9t(y + 1) > 0. 


Exercises 7.2 


7.2.1. Supply a proof to (7.2.9) by using Theorem 4.2.3. 

7.2.2. Derive the exact density of the determinant in (7.2.15) for p = 2. 
7.2.3. Verify the results in (7.2.18) and (7.2.19). 

7.2.4. Derive the normalizing constants c; in (7.2a.12) and c» in (7.2a.13). 
7.2.5. Derive & in (7.2a.14) by integrating out over X. 


7.2.6. Approximate c, and C2 of Exercise 7.2.4 by making use of Stirling's approxima- 
tion, and then show that the result agrees with that in Exercise 7.2.5. 


7.2.7. Derive (state and prove) for the complex case the lemmas corresponding to Lem- 
mas 7.2.1 and 7.2.2. 


7.3. Real Rectangular Matrix-Variate Type-1 and Type-2 Beta Densities 

Let us begin with the real case. Let A > O be p x p and B > O beg x q where A 
and В are real constant matrices. Let X = (x;;) bea p xq, q = p, matrix of distinct real 
scalar variables x;;’s as its elements, X being of full rank p. Then, AX BX' А? > Ois 


real positive definite where A? is the positive definite square root of the positive definite 
matrix A. Let |(-)| represent the determinant of (-) when (-) is real or complex, and |det(-)| 
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be the absolute value of the determinant of (-) when (-) is in the complex domain. Consider 
the following density: 


1 / 1 1 / 1 Ган 

gı(X) = САЗ ХВХА2 |7 |1 — A2XBX'A2|P 2 (7.3.1) 
for A > O, В > О, 1- АЗХВХ'А? > О, %\(8) > 254, RY +4) > 254, and gj =0 
elsewhere, where C, is the normalizing constant. Accordingly, U = АЗХВХ'А? > О 
and 1-0 > O or U and / —U are both positive definite. We now make the transformations 

1 1 

Y = А?ХВ? and S = YY’. Then, proceeding as in the case of the rectangular matrix- 
variate gamma density discussed in Sect. 7.2, and evaluating the final part involving S 


with the help of a real positive definite matrix-variate type-1 beta integral, we obtain the 
following normalizing constant: 


а „к Гр(%) Гу + $ B) 
Cy = 1418] 32 — 3 —7 
n? DD + 9) 


(7.3.2) 


for A > О, B > О, KR(B) > pot ‚(у +3) > =] Usually the parameters associated 
with a statistical density are real, which is the case for y and 6. Nonetheless, the conditions 
will be stated for general complex parameters. When the density of X is as given in gj, the 


density of Y — A2 X B? is given by 


I3 Г, 1 onl 
л? Dpp +5) 


82(Т) = 


for R(y + 2) > к; YY' > О, I—YY’ > О, and g) = 0 elsewhere. When X has the 
density specified in (7.3.1), the density of S = YY’ is given by 


4 p 
AO * 2 P) greg eet 58-20 (7.3.4) 


5) = 
BO)" By ss) 


for R(B) > 251, Ry +H > 251, S > О, 1—8 > О, and вз = 0 elsewhere, 
which is the usual real matrix-variate type-1 beta density. Observe that the density g1(X) 
is also available from the pathway form of the real matrix-variate gamma case introduced 


in Sect. 7.2. 


Example 7.3.1. Let U; = A?XBX'A?, U2 = XBX', Us = B? X'AX B3 and U4 = 
X'AX.If X has the rectangular matrix-variate type-1 beta density given in (7.3.1), evaluate 
the densities of U1, U2, U3 and U4 whenever possible. 
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Solution 7.3.1. Тһе matrix U; is already present in the density of X, namely (7.3.1). 
Now, we have to convert the density of X into the density of U1. Consider the transfor- 
mations Y — A2XB2, S = YY’ = U, and the density of S is given in (7.3.4). Thus, Uj 
has a real matrix-variate type-1 beta density with the parameters y + 4 and В. Now, on 
applying the same transformations as above with A = Z, the density appearing in (7.3.4), 
which is the density of U2, becomes 


T (y + $ + В) 
Ty + $)r (B) 


pt 


Atti [1 АФ dU.) 


g3(U2)dU2 = 


for &(B) > 254, Ry + 1) > 254, A > О, Ur > O, 1— A?USA?. > О, and 
zero elsewhere, so that U2 has a scaled real matrix-variate type-1 beta distribution with 
parameters (y 4-2, В) and scaling matrix A > O.Forq > p, both X'AX and В?Х'АХВ? 
are positive semi-definite matrices whose determinants аге thus equal to zero. Accordingly, 
the densities do not exist whenever д > p. When д = р, U3 has aq x q real matrix-variate 
type-1 beta distribution with parameters (у + £, В) and U4 is a scaled version of a type-1 
beta matrix variable whose density is of the form given in (i) wherein B is the scaling 
matrix and p and q are interchanged. This completes the solution. 


7.3.1. Arbitrary moments 


The h-th moment of the determinant of U = A? X BX'A? with h being arbitrary, is 
available from the normalizing constant given in (7.3.2) on observing that when the h-th 
moment is taken, the only change is that y turns into y + h. Thus, 


E[|U|^] = E[|]YY' l^] = Е[|$|”] 
Ty +ł4+h) Dy -c2-4) 


+ Dv $840 i 
po Г(у + ©+8—#;) uid 
Bj Totg- ry t+$4+p- +h) É 
= E[uf] E[u2] -- Eluh] (7.3.7) 
where иј, ..., ир are mutually independently distributed real scalar type-1 beta random 


variables with the parameters (y + 2 — 5 B) j =1,..., р, provided (8) > pot 
and (у + 2) > Aa 
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7.3.2. Special case: p = 1 


For the case p = 1, let the positive definite 1 x 1 matrix A be the scalar b > 0 and X 
which is 1 x q, be equal to (x1, ..., ха). Then, 


ХІ 
АХВХ' = Ь(х1,...,х;) В 

Xq 

is areal quadratic form, the matrix of the quadratic form being B > O. Letting Y = X B?, 
dY = |B| 2dX , and the density of Y, denoted by g4(Y), is then given by 

«Г@ ri $45 
лі Гу+®Г(В) 
x |I — БҮҮР, ҮҮ' > О, 1—ЬҮҮ' > О, Wy 2) > 0, 1068) > 0 
qp DO Do + +В) 
лі Г(у 9I) 
х [= bO H y)P', У = (у... уд), (7.3.8) 


IY Y|” 


ga(Y) = bY* 


ое | 


for b > 0, R(y +1) > 0, RB) > 0, 1— by] +...+ y2) > 0, and g4 = 0 elsewhere. 
The form of the density in (7.3.8) presents some interest as it appears in various areas of 
research. In reliability studies, a popular model for the lifetime of components corresponds 
to (7.3.8) wherein у = 0 in both the scalar and multivariate cases. When independently 
distributed isotropic random points are considered in connection with certain geometrical 
probability problems, a popular model for the distribution of the random points is the type- 
1 beta form or (7.3.8) for у = 0. Earlier results obtained assuming that у = 0 and the 
new case where у Æ 0 in geometrical probability problems are discussed in Chapter 4 of 
Mathai (1999). We will take (7.3.8) as the standard form of the real rectangular matrix- 
variate type-1 beta density for the case p = 1 ina p x q, q 2 p, real matrix X of rank 


p. For verifying the normalizing constant in (7.3.8), one can apply Theorem 4.2.3. Letting 
g 


S = YY’, dY = P 151214, which once substituted to dY in (7.3.8) yields a total 
2 


integral equal to one upon integrating out S with the help of a real matrix-variate type-1 
beta integral (in this case a real scalar type-1 beta integral); accordingly, the constant part 
in (7.3.8) is indeed the normalizing constant. In this case, the density of S = YY’ is given 
by 


a V(y+4+8) 
S) = prt 2 
oe) “Fy + DF@) 


|s|"*$-!|r — pg! (7.3.9) 


Rectangular Matrix- Variate Distributions 521 


for S > О, Б> 0, R(y + 4) > 0, 9) > 0, and gs = 0 elsewhere. Observe that this 5 
is actually a real scalar variable. 
As obtained from (7.3.8), the type-1 beta form of the density in the real scalar case, 
that is, for p = 1 and q = 1, is 
1 
go(y1) = 6772 Ач - 0 
Г(у + 5)Г(В) 


forb > 0, В > 0, у+ 5 > 0, — 3 < у < F and g6 = О elsewhere. When the 


[»2]" — byt, (7.3.10) 


support is 0 < yı < a the above density which is symmetric, is multiplied by two. 
7.3a. Rectangular Matrix-Variate Type-1 Beta Density, Complex Case 
Consider the following function: 
ži (Š) = Cildet(A? X BX* A2)|" |det(I — A2XBX*A2)|P-P (7.34.1) 


for A > O, B > О, KB) > p—1, Ry +q) > p—1, I АЗХВХ*А? > О, and 
£1 = 0 elsewhere. The normalizing constant can be evaluated by proceeding as in the real 
rectangular matrix-variate case. Let Y = A? XB? so that Š = YY *, and then integrate 
out S by using a complex matrix-variate type-1 beta integral, which yields the following 
normalizing constant: 
Ty) Ty( tate 
n", Tyly +a) p(B) 
forR#(y+q)>p—1, #8) > р- 1, А> О, В> О. 


Су = |det(A)|1 |det(B)|? (7.3a.2) 


7.3a.1. Different versions of the type-1 beta density, the complex case 


The densities that follow can be obtained from that specified in (7.3a.1) and certain 
related transformations. The density of Y = A2 XB? is given by 


E d Г at "m 
go 2D ТРУ +4 * B) d P) det — FE^? (7.3a.3) 
TIP Гу + q)Iy(B) 
for (8) > p—1, Ry+q) > p-1, 1— YY* > O, and £g; = 0 elsewhere. The density 
of S = YY* is the following: 
Poly +9 +B) 
Toy + QI p(B) 


for RE) > p—1, (у +q) > p—1, $2 О, 1— Š > О, and ёз = 0 elsewhere. 


23(5) = |det(S)|”*4-? |det(7 — $)|P-? (7.3.4) 
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7.3a.2. Multivariate type-1 beta density, the complex case 


When р = 1, X is the 1 x q vector (X1, ..., Xq) and the 1 x 1 matrix A will be denoted 
by b > 0. The resulting density will then have the same structure as that given in (7.3a.1) 
with p replaced by 1 and A replaced by b > 0: 


£4(X) = Co|det(X BX*)|"|det(] — bX BX*)|P^! (7.3a.5) 


for B > O, b> 0, В) > p—1, Ry +q) > p—1, I - bX BX* > О, апі i4 = 0 
elsewhere, where the normalizing constant C2 is 


С, = p'**|de py L2 FY +4 + 8) (7.32.6) 


m1 Г(у+9)Г(В) 


forb > 0, B > О, WB) > 0, Ry +q) > 0. Letting Y = XB? so that аў = 
|\det(B) |аХ, the density of Y reduces to 


рч Г) 1 Г(у+9+ В) 
л Г(у +Ë) 
ры Г) 1 Г(у +9 + В) 
74 Fiy +9)Г(В) 


85(?) = Idet(Y Y*)||det( — bY Y*)|F—! (7.34.7) 


[IP -- SII — E + УГУЛ" 
(7.3a.8) 


for R(B) > 0, R(y +q) > 0, 1— (|512 +--+ +1512) > 0, and 25 = 0 elsewhere. The 
form appearing in (7.3a.8) is applicable to several problems occurring in various areas, as 
was the case for (7.3.8) in the real domain. However, geometrical probability problems do 
not appear to have yet been formulated in the complex domain. Let 5 = ЎЎ“, S being in 
this case a real scalar denoted by s whose density is 


pr Г +4 +8) ytq-l 
Г(у + q)I'(B) 


for &(8) > 0, R(y +q) > 0, s > 0, 1 — bs > 0, and 26 = 0 elsewhere. Thus, s is real 
scalar type-1 beta random variable with parameters (у + q, P) and scaling factor b > 0. 
Note that in the real case, the distribution was also a scalar type-1 beta random variable, 
but having a different first parameter, namely, y + $, its second parameter and scaling 
factor remaining В and b. 


ge(s) = (1—bs)f™!, b>0 (7.3a.9) 
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Example 7.3a.1. Express the density (7.3a.5) explicitly for b = 5, р = 1, q=2, B= 


3, y =4, X = [X1, Хә] = [xi iyi, x2 iy2], where xj, yj, j = 1,2, are real variables, 
i = J/(—1), and 
3 140i 
sh? 


Solution 7.3a.1. Observe that since B = B*, B is Hermitian. Its leading minors being 


103)1 = 3 > 0 and к =(3)йу=@+0й—=—0=3—2=1>»0,В1вано 


positive definite. Letting О = X BX*, 


О = 3(x1 + iy) (х1 — iyi) + (1)(х2 + iy2) (x2 — iyz) + (0, + i)Gxi + iyi) (хо — Туз?) 
+ (1 = 1) (a + iy2)(xı = iyı) 
= 3G + ур) + @ + уз) + + )[ххә + утуо — Ї(хуз — х21)] 
+ O-iJ)pxix + уу — i (x2y1 — x1y2)] 
= 3(хї + yp) + Q$ + уз) + 2G0x2 + yiy2) + 20х12 — x271). (i) 
The normalizing constant being 
Г) PO +а+8В) DO ГӨ) 
ла Г(у+ад)Г(8) л? l(9rQ) 


_ 5% YB!) 59168) 
2 л2(5)0) л? ` 


bY +4 |det(B)| 


(i) 
The explicit form of the density (7.3a.5) is thus the following: 


560168) 
т? 


84(X) = Озо: 41—300, 020, 

and zero elsewhere, where Q is given in (i). It is a multivariate generalized type-1 complex- 
variate beta density whose scaling factor is 5. Observe that even though X is complex, 
g4(X) is real-valued. 


7.3a.3. Arbitrary moments in the complex case 


Consider again the density of the complex p х q, q > p, matrix-variate random 
variable X of full rank p having the density specified in (7.3a.1). Let Ü = A2 XBX* A3. 
The h-th moment of the absolute value of the determinant of U , that is, E [|det(U ) "1, will 
now be determined for arbitrary Л. As before, note that when the expected value is taken, 
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the only change is that the parameter у is replaced by y +h, so that the moment is available 
from the normalizing constant present in (7.3a.2). Thus, 


Гу +а4+һҺ) Py-cac) 


E{|det(U)|"] = 2 - (7.3а.10) 
Гру +4) Dyy-ctactBch) 
ра СЕ а шышы ы зар 
x. PHU) Tytgat BSNS Dah C 

= Е[и1]" Eluol --- Elup]" (7.3.12) 

where иј, ..., ир are independently distributed real scalar type-1 beta random variables 
with the parameters (y +q —(j —1), B) for = l,..., p. The results are the same as those 
obtained in the real case except that the parameters are slightly different, the parameters 
being (y 4- Дд — m В), ј =1,..., р, inthe real domain. Accordingly, the absolute value 


of the determinant of Ü in the complex case has the following structural representation: 
|det(Ü)| = |det(A2 XBX*A2)| = |det(YY*)| = |det(S)| =uy---up — (732.13) 


where u1,...,up are mutually independently distributed real scalar type-1 beta random 
variables with the parameters (у +g — (j — 1), В), j=1,..., р. 

We now consider the scalar type-1 beta density in the complex case. Thus, letting 
р = 1 апа ф = 1 in(7.3a.8), we have 


"M 1 y 1-4 f) 
= prs = 
&7(ў1) т Ply + DF) 
_ pul ry +1-+ 8) 
л Г(у + IX) 


for b > 0, (В) > 0, (у) > -1, — œ < уу < оо, j = 1,2, 1 Б(у + y?,) > 0, 
апа g7 = 0 elsewhere. The normalizing constant in (7.3a.14) сап be verified by making 
the polar coordinate transformation уу = r cos0, yj» = r sin 0, as was done earlier. 


[2] DL — ЫЎ], ў = у iyi 


[уу XS DE =Ь(у 4950" (7.32.14) 


Exercises 7.3 


7.3.1. Derive the normalizing constant С in (7.3.2) and verify the normalizing constants 
in (7.3.3) and (7.3.4). 


7.3.2. From E[|A? X BX'A? |]? or otherwise, derive the h-th moment of | X B X'|. What 
is then the structural representation corresponding to (7.3.7)? 
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7.3.3. From (7.3.7) or otherwise, derive the exact density of |U| for the cases (1): p = 
2; (2): p=3. 


7.3.4. Write down the conditions on the parameters y and £ in (7.3.6) so that the exact 
density of |U | can easily be evaluated for some p > 4. 


7.3.5. Evaluate the normalizing constant in (7.3.8) by making use of the general polar 
coordinate transformation. 


7.3.6. Evaluate the normalizing constant in (7.3a.2). 
7.3.7. Derive the exact density of |det(U)| in (7.3a.13) for (D: p = 2; (2): p =3. 


7.4. The Real Rectangular Matrix-Variate Type-2 Beta Density 


Let us consider a p x q, 9 > p, matrix X of full rank p and the following associated 
density: 
gg(X) = C4AXBX'|V|I + AXBX'| 9*$*5 (7.4.1) 


for A > O, B > О, R(B) > Ry + 5) > pot and gg = О elsewhere. The 
normalizing constant can be seen to be the following: 


g g 
= т I t Y P 
т# Dy + DT p(B) 


(7.4.2) 


for A > О, В > О, Wf) > i Ry + 4) > P7l Letting Y= A? X B?, its density 
denoted by go(Y), is 


D) D +5 +8) 
ле Ty + $)I p(B) 


go(Y) = |IYY "pz + vy o*$*95 (7.4.3) 


for (B) > = RY +1) > = and go = 0 elsewhere. The density of S = YY’ then 
reduces to 
Гр(у + 4 + 8) 


q_ pH —(y+4 
Locr5rg. tps a (7.4.4) 
р 2/* P 


gio(S) = 


for (B) > mE Ry + 4) > =. апа g19 = О elsewhere. 
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7.4.1. The real type-2 beta density in the multivariate case 


Consider the case p = 1 and А = b > O in (7.4.1). The resulting density has the 
same structure, with A replaced by b and X being the | x q vector (x1,..., xq). Letting 
Y= XB?, the following density of Y = (yj, ..., уд), denoted by g11(Y), is obtained: 
Г(2) Г(у +4 + В) 

л? TO Г) 
x [+ 592 + Dr otn (14.5) 


TE Dit] 


for b > 0, #(у + 2) > 0, N(£) > 0, and gj; = 0 elsewhere. The density appearing 
in (7.4.5) will be referred to as the standard form of the real rectangular matrix-variate 
type-2 beta density. In this case, A is taken as A = b > 0 and Y = X B?. What might be 
the standard form of the real type-2 beta density in the real scalar case, that is, when it is 
assumed that p = 1, q = 1, А = b > Qand B = lin (7.4.1)? In this case, it is seen from 
(7.4.5) that 


| Г(у +5 5+ В) 
Г(у +r) 
for (8) > 0, (у + 5) > 0, b > 0, and gi? = 0 elsewhere. 


1 
gi(y = bY*3 [2] [1 + by2] €* 2*0, —oo < уу < оо, (74.6) 


7.4.2. Moments in the real rectangular matrix-variate type-2 beta density 


Letting U = А?Х ВХ' A?, what would be the ^-th moment of the determinant of U, 
that is, E[[U |^] for arbitrary ^? Upon determining E[|U|"], the parameter y is replaced 
by y + h while the other parameters remain unchanged. The h-th moment which is thus 
available from the normalizing constant, is given by 


Гу +$+h) Г,(В — h) 


EIU] = ————— —— Á— (7.4.7) 
DyG + 5) a 
р 
Г(у +4 – 5- ЖЕЛ —h 
= П (у - T гв = ) (7.4.8) 
p DUXI) re- 55 

= E[ut E[u5] -- - Eluh] (7.4.9) 

where иј, ..., ир are mutually dd iu pid real scalar type-2 beta random 
variables with the parameters (у +2 5— i B- E j=1,..., p. That is, (7.4.9) or 


(7.4.10) gives a structural тергезеп ый to the determinant of U as 


IU| = |A2X BX'A3| = |YY'| =|S| =u1---up (7.4.10) 
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where the и ;’s are mutually independently distributed real scalar type-2 beta random vari- 
ables as specified above. 


Example 7.4.1. Evaluate the density of u — |A2 X BX' А? | for р = 2 and ће general 
parameters у, q, В where X has a real rectangular matrix-variate type-2 beta density with 
the parameter matrices A > О and В > О where A is p x p, B isq x q and X isa 
р xq, q = р, rank p matrix. 


Solution 7.4.1. Тһе general h-th moment of и can be determined from (7.4.8). Letting 
р = 2, we have 


| T(y c $- 4$ IP Q $E B) ГОВ — 3-H —-h) 


DTycri-5r(y-9 Г(В — DI() 


for —y — 4 + 1 < (л) < B — І, В > І, y + 4 > 1. Since four pairs of gamma func- 


E([u^] 


(0 


tions differ by І, we сап combine them by applying ће duplication formula for gamma 
functions, namely, 


POP (c+ 1/2) = 222! T Q3). (ii) 
Take z = y +3 = i thandz=y+4- 1 in the first set of gamma ratios in (i) and 


z=ß— 1 —handz-p- 1 in the second set of gamma ratios in (i). Then, we have the 
following: 


TO-A) л2 21-27-9121 гу +q —1- 2h) 
D-$-2PG--) т? 21-2у-@%1Г(2у +q — 1) 
_ semi OY +q—1+2h) 
PQy +4—1) 
Г(8—5—В)Г(8—һ) л? 2!-78+142h гор — 1 — 2h) 
Г(8 – r() л 21-28+1г(28 — 1) 
r(2pB-1) ' 

the product of (iii) and (iv) yielding the simplified representation of the h-th moment of u 
that follows: 


(iii) 


(iv) 


Elu"] = rQy+q—1+2h) ГОВ – 1 – 2һ) 
Гу +9 - 1) Г(2В — 1) 

Now, since E[u"] = Еи? р" = E[y!] with y = u? and t = 2h, we have 
|.TOy t q4 - 1c ) Г(28 1-1) 


t 
35H RO egeD повео i 
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As t is arbitrary in (v), the moment expression will uniquely determine the density of y. 
Accordingly, y has a real scalar type-2 beta distribution with the parameters (2y + q — 
1, 26 — 1), and so, its density denoted by f(y), is 


dy = У+4—2(1 (2у+4+28—2) 4 
f(y)dy FQy tq - DFQB - ^ (1+ у) у 
| ГОу+4а+28—2) i. 
ГОу +9 = )ГОВ – 02 
Thus, the density of и, denoted by g(u), is the following: 


_ 1 ГОу+а+28—2) 
_ 2ГОу+4—1)ГО8—1) 


for 0 < и < oo, and zero elsewhere, where the original conditions on the parameters 
remain the same. It can be readily verified that g(u) is a density. 


iy” FEN 4 u3) Orte 262g, 


yt lq 4 u3)- Qvta*28-2 


g(u) 


7.4.3. A pathway extension in the real case 


Let us relabel fio (X) as specified in (7.2.16), as g13(X) in this section: 
ei (X) = CylAX BX"|”|I — a(1 — a) AZ XBX'A2|T@ (7.4.11) 


forA>O, B>O,n>0,a>0,a <1, 1—a(1—a)A2XBX’A? > О, and ез = 0 
elsewhere, where Сд is the normalizing constant. Observe that fora < 1, a(1 — œ) > 0 
and hence the model in (7.4.11) is a generalization of the real rectangular matrix-variate 
type-1 beta density considered in (7.3.1). When a < 1, the normalizing constant Сд is 
of the form given in (7.2.18). Бого > 1, we may write 1 — a = —(a@ — 1), so that 
—a(1—a) = a(a—1) > 0, а > 11п (7.4.11) and the exponent 7+ changes to — 2/7; thus, 
the model appearing in (7.4.11) becomes the following generalization of the rectangular 
matrix-variate type-2 beta density given in (7.4.1): 


g14(X) = С5|АХВХ'| |I tala — )АЗХВХ'А |а (7.4.12) 


for A > О, B > О, п> 0, а > О, a > land gı4 = 0 elsewhere. The normalizing 
constant Cs will then be different from that associated with the type-1 case. Actually, in 
the type-2 case, the normalizing constant is available from (7.2.19). The model appearing 
in (7.4.12) is a generalization of the real rectangular matrix-variate type-2 beta model 
considered in (7.4.1). When о — 1, the model in (7.4.11) converges to a generalized form 
of the real rectangular matrix-variate gamma model in (7.2.5), namely, 


gis(X) = Cg|AX BX'|Ye * 1 AXBX) (7.4.13) 
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where : А 
|4|21812 r (2) 


ар 


Cg = (aP +D 
n? Dy) 


(7.4.14) 


fora > 0, п> 0, Аъ > О, В > О, у + 3) > са 3 and 215 = 0 elsewhere. Моге 
properties of ће model given in (7.4.11) have already been provided in Sect. 4.2. The real 
rectangular matrix-variate pathway model was introduced in Mathai (2005). 


7.4a. Complex Rectangular Matrix-Variate Type-2 Beta Density 


Let us consider a full rank p х q, q > p, matrix X in the complex domain and the 
following associated density: 


28(Х) = C3 |det(AX BX*)|Y|det(J + AX BX*)| E+ +9 (7.4a.1) 


for A > О, B > О, R(B) > p — 1, R(y +q) > p — 1, and gg = 0 elsewhere, where 
Сз is the normalizing constant. Let 


Y = A3 X B? = аў = |det(A)? |det(B)|^dX, 

and make the transformation 

m zu ра тр Е x 

S = YY* > dY = — Idet(S)|4 PdS. 
pq 


Then, the integral over S can be evaluated by means of a complex matrix-variate type-2 
beta integral. That is, 


Py + a) 
Py +9 + В) 


| |det(S)|”+2-? |де + $)|- б+У+Ф4$ = (7.4a.2) 
S 


for R(E) pe 1, R(v+q) > p—1. The normalizing constant C3 as well as the densities 
of Y and S can be determined from the previous steps. The normalizing constant is 


€ = |det( A)" jde L0. РО +4 +P) (7.443) 


л Dy + q)T p(B) 
for * (8) > p— 1, у +4) > p — 1. The density of Y, denoted by go(Y), is given by 


Гуа) Py +9 B) 


g9(Y) = _ : det(Y Y *)|" |det(J + YY*)| "tat 7.4а.4 
go(Y) тар fo а) РВ). et(Y Y^)|" |detU + )| (7.4a.4) 
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for K(B) > p—1, R(y + q) > p — 1, and go = 0 elsewhere. The density of 5, denoted 
by g10(S), is the following: 

Г, +4 +8) 
Гь(у +9) Г,(В) 
for * (8) > p —1, R(y +q) > p — 1, апа 210 = 0 elsewhere. 


#10(8) = |9е:(5) |774 ае + 5) |07779) (7.4а.5) 


7.4а.1. Multivariate complex type-2 beta density 


As in Sect. 7.3a, let us consider the special case p — 1 in (7.4a.1). So, let the 1 x 1 


matrix A be denoted by b > 0 and the 1 x q vector X = (X1, ..., Xq). Then, 
xy 
AX BX* = bXBX* =b(%,...,%,)B| : | =bU (a) 
XG 


where, in the case of a scalar, an asterisk only designates a complex conjugate. Note that 
when р = 1, U = ХВХ* 15 а positive definite Hermitian form whose density, denoted by 
g11(U), is obtained as: 


Pq) Гу +9 + f) 
m1 y +B) 


for (6) > 0, (у +q) > О and b > 0, and gi; = 0 elsewhere. Now, letting ŠB? 
Y= (Y1, ---, Yq), the density of Y, denoted by go(Y ), is obtained as 


Г) Гу ta tB _ о ТУ - -— 
л Fy аР)?" УТ (yi MT P К 
Aa. 


for R(£) > 0, 9i(y +q) > 0, b > 0, and gi? = 0 elsewhere. The constant in (7.4a.7) 
can be verified to be a normalizing constant, either by making use of Theorem 4.2a.3 or a 
(2n)-variate real polar coordinate transformation, which is left as an exercise to the reader. 


#1100) = bY*4|det(B)| —— Idet(U)|"|det(I + bU)|- 9 +470) (7.4a.6) 


gi (Y) = b+ 


Example 7.4a.1. Provide an explicit representation of the complex multivariate density 
in (7.4a.7) for p= 2, y 22, q =3, b = 3 and В = 2. 


Solution 7.4a.1. The normalizing constant, denoted by c, is the following: 
righ @ Гу +4 +B) 
л“ Г(у+4)Г(В8) 
DO rm 5(2!) (6) _ (609 
л rero m? (4!)(1!) m? 


=b 


(0 
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Letting yı = уп + iyi, Y2 = yn + іу22, Ӱз = ya + іуз2, узу, oj. узј, J = 1,2, 
being real andi = /(—1), 


Q = P + I2 +191” = Ол + у) + ОЗ FD) O53. G) 
Thus, the required density, denoted by g 12(Ў ) as in (7.4a.7), is given by 

g2(Y) = €07(11+30)', — оо < уу < оо, i=1,2,3, ј = 1,2. 
where € is specified in (i) and О, in (ii). This completes the solution. 


The density in (7.4a.6) is called a complex multivariate type-2 beta density in the gen- 
eral form and (7.4a.7) is referred to as a complex multivariate type-2 beta density in its 
standard form. Observe that these constitute only one form of the multivariate case of a 
type-2 beta density. When extending a univariate function to a multivariate one, there is 
no such thing as a unique multivariate analogue. There exist a multitude of multivariate 
functions corresponding to specified marginal functions, or marginal densities in statisti- 
cal problems. In the latter case for instance, there are countless possible copulas associated 
with some specified marginal distributions. Copulas actually encapsulate the various de- 
pendence relationships existing between random variables. We have already seen that one 
set of generalizations to the multivariate case for univariate type-1 and type-2 beta densities 
are the type-1 and type-2 Dirichlet densities and their extensions. The densities appearing 
in (7.4a.6) and (7.4a.7) are yet another version of a multivariate type-2 beta density in the 
complex case. 

What will be the resulting distribution when g = 1 in (7.4a.7)? The standard form of 
this density then becomes the following, denoted by 213(Y1): 


TEL Pye?) 


5 5 121—(у+1+6) 


81301 
for (8) > 0, R(y +1) > 0, b > 0, and 213 = 0 elsewhere. We now verify that this 
is indeed a density function. Let у = уп + iy12, уп and yj? being real scalar quantities 
and і = /(—1). When yı is in the complex plane, –оо < yj; < oo, j = 1,2. Let 
us make a polar coordinate transformation. Letting yj; = rcos@ and ур = rsin@, 


ау A йуу; =r dr л 40, 0 < г < оо, 0 < 0 < 2л. The integral over the functional part 
of (7.4a.8) yields 
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оо оо 
Рев = | J [уп + viol 
MI —00 4 — 00 
x [1+ (у + 25] OTP dy A dyi 


2л со 
= J | [IU + br O19, а0 A dr 
0=0 Јг=0 


= Ө i t" (1+ br) C0 5g; 
2/ J1=0 


which is equal to 
xp oro DG + DFC) 
Г(у + 1+ В) 
This establishes that the function specified by (7.4а.8) is a density. 


7.4a.2. Arbitrary moments in the complex type-2 beta density 


Let us consider the h-th moment of |det(U)| = \det(A2 X BX*A2)| in (7.4a.1). Since 
the only change upon integration is that y is replaced by y +h, the h-th moment is available 
from the normalizing constant in (7.4a.2): 


Г-у tq h) ГВ — h) 


E[|det(Ü)|^] = Poto BO (7.4a.9) 

Е р m EID асаа а о 
са 

= Е[и?]... E(u] (7.4a.11) 

where u1, ..., ир mutually independently distributed real scalar type-2 beta random vari- 


ables with the parameters (y +g —(j – 1), 8—(j—1), j 2 1,..., p. Thus, |det(U)| 
has the structural representation 


|det(Ü)| = |det(A2 X BX*A2)| = |det(YY*)| = |det(S)| =uy---up (744.12) 


where the и,..., ир are as previously defined. The density for a complex scalar type-2 
beta random variable is provided in (7.4a.8). 


7.4a.3. A pathway version of the complex rectangular matrix-variate type-1 beta 
density 


Consider the model specified in (7.2a.12), that is, 


ё140Х) = Caldet(A? X BX* A2)|/ |det(I —a(1 —a)A3XBX*A3)|-« — (74a.13) 
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fora>0,a <1, А> 0, B» О, п> 0, I-a(1 —0)A? XBX* A2 > О, and 314 =0 
elsewhere, where Сл is the normalizing constant given in (7.2a.15). When a < 1, the 
model appearing in (7.4a.13) is a generalization of the complex rectangular matrix-variate 
type-1 beta model considered in (7.3a.1). When o > 1, we write 1—a@ = —(a—1), a > 1 
and re-express 214 as 


dis (X) = Cs|det(A? X BX* A2)|’ |det(I + a(w — DA?2X BX* A)| =T (74a.14) 


fora > 0, a > 1, т > 0, А > О, B > О, and #15 = О elsewhere, where the 
normalizing constant C 5 is the same as the one in (7.2a.16). Observe that the model in 
(7.4a.14) is a generalization of the complex rectangular matrix-variate type-2 beta model 
in (7.4.1). When q — 1, the models in (7.4a.13) and (7.4a.14) both converge to the 
following model: 


l= grat 
ё16(%) = Coldet(A2 X BX*A2)|Y e74 n (A? X BI 42) (7.4.15) 


fora > 0, т > 0, A > О, B > О, and 216 = 0 elsewhere, where the normalizing 
constant Ce is the same as that in (7.2a.17). The model specified in (7.4a.15) is a gener- 
alization of the complex rectangular matrix-variate gamma model considered in (7.2a.1). 
Thus, model in (7.4a.13) contains all the three models (7.4a.13), (7.4a.14), and (7.4a.15), 
which are generalizations of the models given in (7.3a.1), (7.4a.1), and (7.2a.1), respec- 
tively. The pathway model in the complex domain, namely (7.4a.13), was introduced in 
Mathai and Provost (2006). Additional properties of the pathway model have already been 
discussed in Sect. 7.2a. 


Exercises 7.4 
7.4.1. Following the instructions or otherwise, derive the normalizing constant C3 in 
(7.4a.3). 
7.4.2. By integrating over Y, show that (7.4a.4) is a density. 


7.4.3. Evaluate the normalizing constant in (7.4a.7) by using (1): Theorem 4.2a.3; (2): a 
(2n)-variate real polar coordinate transformation. 


7.4.4. Given the standard real matrix-variate type-2 beta model in (7.4.5), evaluate the 
marginal joint density of y1, ..., yj, к < p. 


7.4.5. Evaluate the density in (7.4a.4) explicitly for p = 1 and q = 2. 
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7.4.6. Given the standard complex matrix-variate type-2 beta model in (7.4a.7), evaluate 
the joint marginal density of y1, ..., Yr, F < p. 


7.4.7. Derive the density of Idet(U)| in (7.4a.12) for the cases (1): р = 1; (2): р = 2. 
7.4.8. Derive the density of |U| in (7.4.10) for (1): p = 2; (2): р =3. 
7.5,7.5a. Ratios Involving Rectangular Matrix-Variate Random Variables 


Since scalar variables such as type-1 beta, type-2 beta, F, Student-t and Cauchy vari- 
ables are all associated with ratios of independently distributed random variables, we will 
explore ratios involving rectangular matrix-variate random variables. Such ratios will yield 
the rectangular matrix-variate versions of the aforementioned ratios of scalar variables. Let 
the p x nj, p < пі, full rank matrix X, and the p x n2, p < n», full rank matrix X» 
be independently distributed real matrix-variate random variables having the rectangular 
matrix-variate gamma densities specified in (7.2.5), that is, 


nj p "T 
[A;| 2 1B;|? P5 5) 
РО) = ap n; 


[Aj XB; XI Pie TAB 551.2, (7.5.1) 


for A; > О, Bj > О, (yj + =) > E and n; > p where Aj is p x p and Bj is 
nj x nj. Then, owing to the statistical independence of the variables, the joint density of 
X, and Хэ is f (X1, X2) = f1(X1) fo(X2). Consider the ratios 


1 
2 ^2 


l l l l 2 1 1 
О = Y (axexa) (Аха хл}) Y (axexa) (i) 
j=l j=l 
and 
1 1\ TZ 1 1 1 gx 
Uz = (авл) (4) x181x1 47) (ава) (ii) 


1 1 
Let us derive the densities of {Л and U2. Letting V; = A;X jBi. we have dX; — 


|A}; Б: В| $dVj. Denoting the joint density of V; апа V» Бу g(Vi, V2), it follows that 
f(X1, X2)dX1 A dX2 = g(Vi, V2)dV, A dV» and so, 


to nOD 
g(Vi, V))dViAdV, = I] 2 


———————— Vi VII? | Vo ет ИУ) ду лау. 
njP : 
jaw? DG; +H) 
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ЖР n; 
Letting W; — Vj Vi, dV; — WE ай), and the joint density of №! and W2, 
p 
denoted by A (Wi, W2), is the following: 


pH Ha pri _. 
||?“ 2 e *«Wi*W2 — (752) 


2 
1 ny 
h(Wi, W2) = — — —ga- HIWiIA* 27 
П Toj + 3) 


1 1 
Note that U; = (Wi + Wa) 32 Wi (W; + W2)-? and Uz = W, 7W,W, 2. Then, given 
the relationship between independently distributed matrix-variate gamma variables and 
a type-1 matrix-variate beta variable and a type-2 matrix-variate beta variable, U; and 
Uz are distributed as real matrix-variate type-1 beta and type-2 beta random variables, 
respectively, both with the parameters (у + 5, yo + 7), that is, 


П] n» n| n2 
Ui ~ type-I beta( yı + T уз +) and U2 ~ type-2 beta( yı + =, y2 +Z). 


Thus, we have the following result: 


Theorem 7.5.1. Let X, of dimension p xni, p < ni, and X» of dimension p xn», p < 
n2, be rank p matrices that are independently distributed rectangular real matrix-variate 
gamma random variables whose densities are specified in (7.5.1). Then, as defined in 
(i) and (ii), О, and U2 are respectively real matrix-variate type-1 beta and type-2 beta 
distributed with the same parameters (yı + 5, y2 + 2). Thus they have the following 
densities, denoted by g;(Uj), j = 1,2: 


n n p+1 
g\(U,)dU; = с И и ЭЕ КТ am. 0 UD ed. (7.5.3) 
and zero elsewhere, and 
n p--1 n n 
g3(U3)dU5 = c |U| ^ 3 p |+, Ur > О, (7.5.4) 


where 


c= SEM FATED) ay M) pe 
Dy + Э) D,» T 2 | 


Analogous derivations will yield the densities of U; and U2, the corresponding matrix- 
variate random variables in the complex domain: 
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Theorem 7.5a.1. Let Xi of dimension p xni, p < пі, and X) of dimension p xn», p < 
по, be full rank rectangular matrix-variate complex gamma random variables that are 
independently distributed whose densities are 
x ge e id A;I74B;P P; "NN" démos. e 
fiX dX; = | il pee (det(A jŠ; B Š Vie "^" PPI Dag, (7.5а.1) 
n" PP + ny) 


where Aj = А? > О апа В; = В? > О with A; being p x p and Bj, nj x nj, 
pznj,j-—L2.Letting 


1 


2 1 2 
н ОК CAE И 
Oy = (УА ВА) АХ ва XA (D> AFR вА? 
j=l 


Ss. NIE 
М.У 
| 
NI 


ena " 1 "Cm uM 86 Р d 
Uz = (А5 X3B2X5A5) ? (AL X1 BX} Aj) (AZ XoB2X5A5) 2, (7.5a.2) 
the densities of Ü, and U>, denoted by gj (U;), j = 1,2, are respectively given by 
21(U1)dU, = e|det(U,)|^ ^"'7P|det( — U,)|2*?PdU;, О < Uy <I,  (7.5a.3) 
and 


£2(U5)dU, = € |де)" |де + U5)| F244") 0, Ur > О, (7.5а.4) 


where . 
Mp yo +11 n2) 


Py + т) Pp + n2) 


c= Ry; пу) > р-1, Ј =1,2. 

The densities specified in (7.5.4) and (7.5a.4) happen to be quite useful in real-life 
applications. Connections of the type-2 beta distribution to the F-distribution, the Student- 
t? distribution and the distribution of the sample correlation coefficient when the pop- 
ulation is Gaussian, have already been pointed out in the course of our previous dis- 
cussions with respect to the scalar, vector variable and matrix-variate cases. Some fur- 
ther relationships are next pointed out. Let [Y;,..., Yn} constitutes a simple random 
sample where Y; m №(и, X), X > О, j = 1,...,n, and the sample matrix be 
denoted by Y = [Y;, №,..., Yn]; letting Y = Цу + --- + Y,] and the matrix of 
sample means be Y= [Y "M 4 ], the sample sum of products (corrected) matrix is 
S = (Y — Ү)(Ү — Y), which is unaffected by u. We have determined that 5 follows 
а real Wishart distribution having m = n — 1 degrees of freedom, and that when m is 
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known to be a null vector, YY’ is real Wishart matrix with n degrees of freedom. Now, 
consider А? Ү, m Np (Ати, А? УА?); when u = О, the sample sum of products matrix 
is A2YY' A?, which can be expressed in the form of the product of matrices appearing in 
(7.5.1) with B = I. Hence, we can regard A2 X BX' A3 (or equivalently A X BX’ in the 
determinant in (7.5.1)) as a weighted sample sum of products matrix with sample sizes nı 
and nz. Then, the type-2 beta density in (7.5.4) with U2 replaced by 1.05 corresponds to 
a generalized real rectangular matrix-variate F-density having nj and n» degrees of free- 
dom where (72 is as defined in (ii). Moreover, for у = 0 = y», this density will correspond 
to a rectangular matrix-variate Student-t density. The material included in Sect. 7.5,7.5a 
may not be available in the literature. 


7.5.1. Multivariate Е, Student-t and Cauchy densities 


The densities appearing in (7.5.4) and (7.5a.4) for the p x p positive definite matrices 
U> and U> have dU> and аб» as differential elements. A positive definite matrix such as 
U2, can be expressed as U2 = TT’ where T of dimension p x nj, p < nj, has rank 
р, and we can write dU; in terms of dT. We can also consider the format U2 = TCT’ 
where C > О is anny, x nı positive definite constant matrix. In other words, we can arrive 
at the format in (7.4.1) from (7.5.4), and correspondingly obtain (7.4a.1) from (7.5a.4). 
Let us re-examine the expressions given in (7.4.1) and (7.4a.1), which could be referred 
to as rectangular matrix-variate F and Student-t densities in the real and complex cases 
for specific values of the parameters В and y. Now, let p = 1 and A = a > Q in (7.4.1) 
wherein a location parameter vector u is inserted. The resulting density is 
v*$|Bg|3r(4 q 
(oes ey 
л? Г(у + $)r (8) 


х 1 +a(X — p)B(X — uy ttt Bax (7.5.5) 


where X and и are 1 x q row vectors, the corresponding density in the complex domain, 
denoted by h(X), being the following: 
орар _ 1а|/1@|Че(В)|Г(4)Г(у+4+8) 5 5v - 
й(Х)ай = ОХ – BR – ду} 
лч Г(у+9)Г(В) 
x [1 + a(X — i) (X — MT 4 Pax, (7.52.5) 


For specific values of the parameters, the densities appearing in (7.5.5) and (7.5a.5) can 
be respectively called the multivariate F and Student-t densities in the real and complex 
domains. With a view to model certain types of signal processes, (Kondo et al., 2020) 
made use of a special form of the complex multivariate Student-t wherein y = 0, a = 2 
and 8 = 5, which is given next. 
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7.5a.1. A complex multivariate Student-: having v degrees of freedom 


24Г(5 +9) 
(ол) Г (5) |det(Z)| 


—(5+@ 


hi(X)dX = [1 + (x =). ux i| ‘ax. (7.5a.6) 


A complex multivariate Cauchy density that, as well, is mentioned in Kondo et al. (2020), 
can be obtained by letting v = 1 in (7.5a.6). We conclude this section with its representa- 
tion, denoted by h2(X): 


7.5a.2. A complex multivariate Cauchy density 


24 Г(3+4) 
ле l'(3)|detCZ2)| 


iy (X)aX = П +200 5-10 — )у*] Pak. — (75a) 


Exercises 7.5 


7.5.1. Derive the complex densities in (7.5a.3) and (7.5a.4). 


7.5.2. Derive the normalizing constant in (7.5a.6) by integrating out the functional por- 
tion of this density. 


7.5.3. Derive the normalizing constant in (7.5a.7) by integrating out the functional por- 
tion of this density. 


7.5.4. Derive the density in (7.5a.6) from complex q-variate Gaussian densities. 
7.5.5. Derive the density in (7.5a.7) from complex q-variate Gaussian densities. 


7.6. Rectangular Matrix-Variate Dirichlet Density, Real Case 


For the real matrix-variate type-1 and type-2 Dirichlet models involving sets of real 
positive definite matrices, the reader is referred to Sects. 5.8.6 and 5.8.7. The correspond- 
ing rectangular matrix-variate cases will be considered in this section. Let A; > O, j = 
1,...,k, be p x p real positive definite constant matrices, and Bj, j = 1,...,k, be 
qj X qj real positive definite constant matrices. Let X;, j =1,...,k, be px qj, qj = p, 
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rank p real matrices whose elements are distinct real scalar variables. Then, consider the 
real-valued scalar function of X4,..., Xx, 


Л(Х\,..., Xy) = САТХ By XY |" АХВ Х| 
1 1 1 1 
x |I — AXiByXLAL —  — АХ BX, AZ|!" (7.6.1) 


1 1 1 1 
1 1 | "s 1 
for Aj > О, Bj > О, А?Х}В;Х,А?* > О, j=1,...,k, I- У ATX;BjXLAT > 


О, (yj + 2 > j =1,...,k, and fj = 0 elsewhere, where C; is the normalizing 


constant. This normalizing constant can be evaluated as follows: Letting 
5 1 4j p | . 
Y; = A; XjBy > dY; —|Aj|? |Bjl7dXj, j = 1,..., К, (i) 


the joint density of Y;,..., Yg, denoted by f2(Y1, ..., Yg), is given by 


mro Пи уви Сиу... yz 


x [I — Э Y; Ar (7.6.2) 
Now, let 
i 
л X- T 
S; = ҮҮ; > ү; = ту 21514 B 45), Jesse (ii) 
Then, the joint density of 51, ..., Sk, which follows, is a real matrix-variate type-1 Dirich- 
let density: 
k age 
E -2 л 2 
Br $9 = Ce ПА 1822] 
r$) 


x 
X 
pem 
a 
= 
M 
У 
n 


ГЕСЕ SE (763) 


for S; > О, RY; + 2L) > = j =1,...,k. Next, on integrating out S1, ..., Sk, by 


making use of a type-1 real matrix-variate Dirichlet integral that was defined in Sect. 5.8.6, 
we have 


(Пу. To + PHD e чу p-1 


ci Ry - 4) > ==, j = 1,..., 6, (iii) 
Гб jai vi + У.) 2 2 
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and 9t (уст) > bot Then, as obtained from (i), (ii) and (iii), the normalizing constant is 


k 4j 
g > plp) 1 
= | [[14;ž18;1Ż а = | 
i=] E Lyr) 


m 3 


е Гу ++ йы + b+ + 9) 
TI Г, (у; + PU oe) 


for Aj > О, В; > О, qj > p, y; + 5) >, ј = 1,...,К, and (у) > Z5. 


(7.6.4) 


7.6.1. Certain properties, real rectangular matrix-variate type-1 Dirichlet density 


1 1 

Letting U = уу Ж A; X ;B;X',A;, what might be the distributions of U and 7 — U? In 
the real scalar case, one could have easily evaluated the moments E[(1 — u)^] for arbitrary 
h, which would have automatically determined the distribution of 1 — u, and therefrom 
that of u. In the matrix-variate case as well, one can readily determine the h-th moment of 
the determinant of I — U, E[|I — U|"], and the unique resulting distribution. However, 
the distribution of a determinant being unique does not imply that the distribution of the 
corresponding matrix is unique. Thus, we have to resort to other approaches for obtaining 
the distributions of U and / — U. Consider the following transformation: 


1 1 i 1 1 
Vj = ATX;BjX, A5, ј = 1,...,Е-1, и = У AIX) ВХА? =U. 
ј=1 
1 1 1 1 
Then Ay Xx BeX,Ag = U — Vi — +++ — Vi-i and I — SAR MA = е ye 
I — U. Noting that 
he 
on anata ит TB; ris n m^: лаи AdU, (7.6.5) 


the joint density of Vi, ..., Ve-1, U, denoted by f3(Vi,..., Vk-1, U), is seen to be 


ГЬ 10+ D ye) bo 2j 
fii sss Vers = ea TE н. 2 = 2 | ies) 
П. Гь0 + DD pet) ^ je 
пы ы A EE |^"— T. (7.6.6) 


where 


|U = Vi -  — Veal = [U] IZ — UTE VUT? e = 071% 07%]. 
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Letting W; — U-3V;U-3, j=1,...,k—1, for fixed U we have 


dV, A... A dV. = [UEDC AW] a... A dW; 4. 


Now, the joint density of Wi,..., Мк and U, denoted by f4(Wi,..., Wy 1, U), is the 
following: 


Раби, Went, U) = C U EHUD- — ора [TT] 


j=l 
pti 


х | = уе Wet- 22 (7.6.7) 


where C, is the normalizing constant. We then integrate out Wi,..., Wg—1 by using a 
(k — 1)-variate type-1 Dirichlet integral, this yielding the result: 


[Ios | fre 7.) for ty; + (Es PT joik 


Accordingly, the marginal density of U is the following: 


Ino = 10У + T) + уы) Uji 
Ty az + 9) Гин) 


fr О < U < I, Rj + Y) > 251, j= 1,..., k, Ros) > 2, and fs = 0 
elsewhere. Thus, U is a real matrix-variate type-1 beta with the parameters (у `* j=1j + 


Рр yp" (768) 


fs(U) = 


2L), ук+) and therefore that J — U is a real matrix-variate type-1 beta with the parameters 
(Vk+1> ye yj + 1i), These results are now stated as a theorem. 


1 1 

Theorem 7.6.1. Consider the density given in (7.6.1). Let U = De А; Xj BjX'Aj. 

Then, U has а real matrix-variate type-I beta distribution whose parameters are 

ia; + 2L), Уні) and I — О is distributed as a real matrix-variate type-1 beta 
with the parameters (yy 41, Уу ay + 2L)), 

The A-th moment of the determinant of the matrix 7 — U can be evaluated either from 


Theorem 7.6.1 ог from Eq. (7.6.1). This -th moment of the determinant, which can be 
worked out from the normalizing constant appearing in (7.6.8), is 


Dii). ГО 10 + ED + ves) 


E[J — U|^] = Т 4j 
D») Dy j= TO) Vat +h) 


(7.6.9) 
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for (у; + 2) > =, REL) к, Observe that a representation of the h-th moment 


of the determinant of U cannot be derived from (7.6.9). However, E[|U |^] can be readily 
evaluated from Theorem 7.6.1: 


IS AG T 2) Th) Fo тй + 2 T Укчы!) 


E[U|^] = - 
DÈ aj +) DOT aj + Dt yea +h) 


(7.6.10) 


for (уу + 2) > j=1,...,k, иы) > B Upon expanding the Г, (-)'s in 


terms of J'(-)'s, the following structural representations are obtained: 


|| — U| = ui: up, (7.6.11) 

|U| = vi ++- vp, (7.6.12) 

where u1,...,up are independently distributed real scalar type-1 beta random variables 
with the parameters (уру — Es уй 10у; + %)), j= 1,...,К, and 9152221 are 


independently distributed real scalar type-1 beta random variables with the parameters 
. c] 
07710 + 5) — =, у ы), Ie een 


7.6.2. А multivariate version of the real matrix-variate type-1 Dirichlet density 


For р = 1, consider the joint density of Y1,..., Yg in f2(Y1, ..., Yk), which shall be 
denoted by f6(%,..., Yg). Then, 


DD, ТО 03 t + ун) үү 
љо. {П 2) an j 2 k+1 Пу} 
а я [Ia ro; Pp ғ 


x —YXY -.--—Xpee- m, (7.6.13) 


the conditions on the parameters remaining as previously stated. Note that Y; is of the 
form Y; = (yj1,---, yjq;). so that Y;Y; = уй Tec Y Thus, in light of its structure, 
the density appearing in (7.6.13) has interesting properties. For instance, it can be ob- 
served that all the subsets of Y1, ..., Y; also have densities belonging to the same family. 
Accordingly, the marginal density of Y, is the following: 
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Г(®) Fin +5 + у) p? 
т Pli PDT» Yn 
1 
x [L7 yf o c yt PEE (1.6.14) 


ЛО) = 


& + Via ]^ 


for оо < yj, < оо, г = 1,..., 41, 0 < Уй ЕУ < 1, R+) > 0, RQ) > 
0, and f; = 0 elsewhere. As has already been mentioned, the structure in (7.6.14) is related 
to geometrical probability problems involving type-1 beta distributed isotropic random 
points. Thus, (7.6.13) suggests the possibility of generalizing such geometrical probabil- 
ity problems in connection with a type-1 Dirichlet density as the underlying density for 
the random points. This does not appear to have yet been discussed in the literature on 
geometrical probability. 

The complex case of the type-1 rectangular matrix-variate Dirichlet density, the real 
and complex cases of the rectangular matrix-variate type-2 Dirichlet density and their 
generalized forms can be similarly handled; hence, they will not be further discussed. 
Certain of these cases are brought up in this section's exercises. 


Note 7.6.1. One could also consider a pathway version of the model appearing in 
Eq. (7.6.1). Let us replace the second line in (7.6.1) by 


= 


1 1 
|Z — a(l — (A3 X B1X1 AŽ +... + AŽ Xp BEX, AŻ) 


where a > 0, a < 1, 7 > О аге real scalar and урт by D and denote the resulting 
moded by fg whose corresponding equation number will be referred to as (7.6.15). Ob- 
serve that (7.6.15) belongs to a generalized type-1 Dirichlet family of models and that the 
new normalizing constant will be denoted C1. Fora > 1, write Ca(1—o) = a(a—1) > 0, 
and then > = — with (7.6.16) as the 
associated equation number. Note that (7.6.16) is actually a generalized type-2 Dirichlet 
model whose normalizing constant, denoted Сә, will be different. Taking the limits as 
a — 1— 1р (7.6.15) and a — 1+ in (7.6.16), both the models fg in (7.6.15) and fo in 
(7.6.16) will converge to a model fọ whose associated equation number will be (7.6.17), 
wherein the second line corresponding to the second line in (7.6.1) will be 


1 1 1 1 
e74 ntr(Af X1B1X Aj БАР ХЕВЕХ, А) 
this limiting model having its own normalizing constant denoted by Сз. As well, it can 
be established that, under the above limiting process, both Су and Со will converge to 
Суз. Now, observe that the matrices X1, ..., Xy in model fio are mutually independently 
distributed real rectangular matrix-variate gamma random variables. This turns out to be 
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an unforeseen result as, in this case, the pathway parameter o is also seen to control the 
dependence to independence transitional stages. Results analogous to those obtained in 
Sects. 7.6, 7.6.1, and 7.6.2 could similarly be derived within the complex domain. 


Exercises 7.6 


7.6.1. Construct, in the complex domain, the rectangular matrix-variate type-1 Dirich- 
let density corresponding to the density specified in (7.6.1) and determine the associated 
normalizing constant. 


7.6.2. Establish, in the complex domain, a theorem corresponding to Theorem 7.6.1. 


7.6.3. Establish, for the complex case, the structural representations corresponding to 
(7.6.11) and (7.6.12). 


7.6.4. Construct a real rectangular matrix-variate type-2 Dirichlet density corresponding 
to the density in (7.6.1). 


7.6.5. Construct a complex rectangular matrix-variate type-2 Dirichlet density corre- 
sponding to the density in (7.6.1). 


7.6.6. When the p x д; matrices X;'s jointly have a real type-2 Dirichlet density with 

the parameter matrices A; > О, В; > O as in (7.6.1) where Aj is p x p, Bjisqj x qj 

and X; is p X qj, qj = p, j = 1,...,k, of full rank p, establish that U = [J + 
1 1 


у (А; XjBjX f А; )]7! has a real matrix-variate type-1 beta distribution and specify its 
l 1 1 

parameters. What about the density of x (A; X; B; X^ A5) in this case? 

7.6.7. Answer the questions in Exercise 7.6.6 for the corresponding type-2 Dirichlet 

density in the complex domain, replacing X; by X ; and X, by X j 

7.6.8. For the real type-2 Dirichlet density in Exercise 7.6.4, determine E[|U |"] for U as 

specified in Exercise 7.6.6. 

7.6.9. Extend all the results obtained in Sect. 7.6 to the complex domain. 


7.6.10. Derive, in the complex domain, results that are analogous to those obtained for 
the real case in Note 7.6.1, while keeping a, 7 and o real. 


7.7. Generalizations of the Real Rectangular Dirichlet Models 


The first author and his collaborators have considered several types of generalizations 
to the type-1 and type-2 Dirichlet models for real positive definite matrices and Hermi- 
tian positive definite matrices. We will propose certain extensions of those results to rect- 
angular matrix-variate cases, both in the real and complex domains. Again, let X; be a 
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PX qj» qj = p, matrix of full rank p having distinct real scalar variables as its elements, 
for j = 1,...,k. Let the constant real positive definite matrices A; > О and В; > О, 
where Aj; is p x p and Bj is qj; x qj, j = 1,...,k, be as defined in Sect. 7.6. Consider 
the real model 


1 1 1 1 
Риб, .... X) = DJIADGBXL ATI I — А2ХІВІХ AT UP 
2, 


1 1 1 1 
х [А2 XoB3X, AZ|"|I — У  А?Х,В,Х,А?|?... 
j=l 
1 1 2 І 1 р+1 
х |A? Xk BX} А21 — JA DBX А ete (7.7.1) 
j=l 
for R(yj + 2L) > a j=1,...,k, орь) > E and other conditions to be speci- 


fied later, where D; is the normalizing constant. For evaluating the normalizing constant, 
consider the following transformations: 


І 1 qj 
Z; = A?X;B? > dX; = |А ?|В| #47), ј = 1,..., к, (0) 


so that the model ў changes to f12 where 


k 
_ 4] _P 
fiis... Ze) = Def Ал ви ра" 
j=l 
x I — 21211202222] 
k 


х|1— 7121 — 72204] EE |Z ZI = y zz tE, 
j=l 
(7.7.2) 
Now, letting 

аур 

/ m2 dj pti . . 

Zj£,—5; dZ; = ——у—|5)|? 2 dsj, FEL E (ii) 
I5 C7) 


546 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


the model becomes 
qjP 


k 

zs —Р m 4| pti 

Лз(51, .. Se) = Ал *|Bj| * — His m E 
Г, 2) 


j=l 
92 pil 


x |1— Sift 2 |1] — $1 — SP... 


k 
x [SelM $5 4p — Y spen. VER 
j=l 


Now, consider the transformation (5.8.20), namely, 


51 = Yi 
Sp = (I — И) (1 — Y? 
1 1 1 1 

Sj —üu-—Y)-.u-Yj;)Y;u —Yj)?-.0—-Y)? (7.7.4) 
for j = 2,...,k. Then Ү,..., Y; will be independently distributed real matrix-variate 
type-1 beta random variables with the parameters (v; = y; + 2. б), J = 1,...,k, 
where 

qj4l k+1 . P 

ôj = ZEE i yet e Bie ecÉR j= 1,...,k, ападкі = О. (iii) 


The normalizing constant D; is thus the following: 


_ 4] p E Г, E $; j 
D, — VI FREI = e e 
j=l лт i= Hp] 


(7.7.5) 


—1 —1 А А А 
for à; > E б; > E j =1,...,k, where the o;'s and 6;’s are as previously given. 


Properties parallel to those pointed out in Sects. 7.1—7.6 can also be studied for the model 
specified in (7.7.1). The marginal distributions of subsets of the matrices X1, X5, ..., Xx, 
taken in the order, will belong to the same family of densities. There exist other general- 
izations of the type-1 and type-2 Dirichlet models. For all such generalizations, one can 
extend the results to the rectangular matrix-variate cases in both the real and complex 
domains. 
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Exercises 7.7 


7.7.1. Develop the transformation corresponding to (7.7.4) for the real type-2 Dirichlet 
case. Specify the Jacobians of the transformation (7.7.4) and the corresponding transfor- 
mation for the type-2 case. 


7.7.2. Verify the result for ô; in (iii) following (7.7.4) and develop the expression corre- 
sponding to à; for the type-2 Dirichlet case. 


7.7.3. Derive the joint marginal density of X;,..., X,, r < k, by integrating out the 
matrices starting with X; in (7.7.1). 


7.7.4. Develop, in the complex domain, the model corresponding to (7.7.1) and derive its 
associated normalizing constant. 


І l 
7.7.5. If possible, derive the density of U = ae A; Xj BjX,A; where the X j's, j = 
1,...,, jointly have the density given in (7.7.1). 
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Chapter 8 A) 
The Distributions of Eigenvalues апа Eigenvectors ss 


8.1. Introduction 


We will utilize the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital let- 
ters X, Y,... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of let- 
ters such as x, y, Х : Y to denote variables in the complex domain. Constant matrices will 
for instance be denoted by A, B, C. A tilde will not be used on constant matrices unless 
the point is to be stressed that the matrix is in the complex domain. Other notations will 
remain unchanged. 


Our objective in this chapter is to examine the distributions of the eigenvalues and 
eigenvectors associated with a matrix-variate random variable. Letting W be such a p x p 
matrix-variate random variable, its determinant is the product of its eigenvalues and its 
trace, the sum thereof. Accordingly, the distributions of the determinant and the trace of 
W are available from the distributions of simple functions of its eigenvalues. Actually, 
several statistical quantities are associated with eigenvalues or eigenvectors. In order to 
delve into such problems, we will require certain additional properties of the matrix-variate 
gamma and beta distributions previously introduced in Chap. 5. As a preamble to the study 
of the distributions of eigenvalues and eigenvectors, these will be looked into in the next 
subsections for both the real and complex cases. 


8.1.1. Matrix-variate gamma and beta densities, real case 


Let W; and W» be statistically independently distributed p x p real matrix-variate 
gamma random variables whose respective parameters are (oj, В) and (o5, B) with 
(aj) > a j = 1,2, their common scale parameter matrix B being a real positive 
definite constant matrix. Then, the joint density of Wi and W2, denoted by f (Wi, W2), is 
the following: 
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|В|®1+©2 à р+1 Pile tr(B(W +W: 
Tempa Wl à IW? ° “ш 


ЛО, W2) = В > О, Wj > О, Raj) > ва, jode = (8.1.1) 
0 elsewhere 
Consider the transformations 
—1 = 
= (Wi + W2)7 2 Wi(Wi + W2)7? and Uz = W, 2WiW; 2, (8.1.2) 


m 


which are matrix-variate counterparts of the changes of variables иј = 7T CES and их = 
in the real scalar case, that is, for р = 1. Note that the square roots in (8. I .2) are symmetric 
positive definite matrices. Then, we have the following result: 


Theorem 8.1.1. When the real matrices U, апа U2 are as defined in (8.1.2), then О is 
distributed as a real matrix-variate type-I beta variable with the parameters (o1, 02) and 
U>, as a real matrix-variate type-2 beta variable with the parameters (a, o2). Further, 
О and U3 = № + № are independently distributed, with U3 having a real matrix-variate 
gamma distribution with the parameters (o + a2, В). 


Proof: Given the joint density of W; апа W2 specified in (8.1.1), consider the transforma- 
поп (W1,W2) — (U3 = Wi + W2, U = Wi). On observing that its Jacobian is equal to 
one, the joint density of U3 and U, denoted by fı(U3, U), is obtained as 


fi (Us, U) dU3 A dU = cU з — U [2-7 e "OD dp, A dU (i) 
where 
|В |17 
c= ——____. (ii) 
Pp (a1) Ip (@2) 


Noting that 
l l 
IU3 — U| = [Us| |Z — U} * UU, 7], 
o l Я 
we now let U; = U, “UU, ^ for fixed Us, so that dU; = lU, au. Accordingly, the 


joint density of U3 and U1, denoted by #5 (03, U1), is the following, observing that U1 is 
as defined in (8.1.2) with №! = U and U3 = Wi + W2: 


р a 


f2(U3, U1) = c|U3| t- eg 0» p, |125 = U,po- F (iii) 
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for R(a,) > = (ол) > s. U3 > О, О < Д < I, and zero elsewhere. On 
multiplying and dividing (iii) by Гу(о + 0/2), it is seen that U; and U3 = № + № are 
independently distributed as their joint density factorizes into the product of two densities 
g(U1) and g1(U3), that is, f2(U1, Оз) = g(U1)g1(U3) where 


Ty (a1 + 02) p 


ape e j ees. ое е (8.1.3) 
I, (a) I) (a2) 


for (01) > as (оо) > pot and zero elsewhere, is a real matrix-variate type-1 beta 


density with the parameters (o1, оо), and 
|B [2 +о2 p+! 


21(U3) = ———_-|U3|1 ‘7 eE) (у, > O, (8.1.4) 
I, (a + a2) 


for В > О, (о + a2) > к, and zero elsewhere, which is a real matrix-variate 
gamma density with the parameters (v; + a2, B). Thus, given two independently dis- 
tributed p x p real positive definite matrices У; and W2, where № ~ gamma (oj, B), 
B > О, Ray) > 2", and И ~ ратта (оз, B), B > О, (оо) > ZÆ, one has 
U3 = Wi + W2 ~ gamma (o; + oo, B), B > О. 


In order to determine the distribution of U2, we first note that the exponent in (8.1.1) 
is tr(B(W; + W2)) = t[B2W, B? + B2W>B2]. Letting V; = B2W)B2, dV; = 
|B| dW;, j = 1,2, which eliminates B, the resulting joint density of Уу and V2, de- 
noted by f3(Vi, V2), being 


1 p " 
BM: V5) = Tene 2 [и |2 а tr(Vi+ V2). Vj > О, (8.1.5) 
pNX1)4 plQ2 


1 
for9(oj) > = j = 1, 2, and zero elsewhere. Now, noting that [у + V] = tr[ V5 (I+ 


_1 ai 1 _1 E А 
V, ° Vi V, 2) У] and letting V = V, ? ИУ, ^ = U2 of (8.1.2) so that dV = | av 
for fixed V2, the joint density of V and V2 = V3, denoted by f4(V, V3), is obtained as 


fa(V, V3) = yor AP yyy ton оу) aot) (8.1.6) 
D, (01) Г, (2) 
р\®1)1 р‹02 


1 1 
where tr[ V4 (1+ V)V4] was replaced by tr[(/ + V)2V3(1 + V)1]. It then suffices to 
integrate out V3 from the joint density specified in (8.1.6) by making use of a real matrix- 
variate gamma integral, to obtain the density of V = U^» that follows: 


D5(o + a2) 


too Иа. Jy pietre. ыд, (8.1.7) 
p р 


g2(V) = 
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for (0) > E j = 1, 2, and zero elsewhere, which is a real matrix-variate type-2 beta 
density whose parameters are (01, @2). This completes the proof. 


8.1a. Matrix-variate Gamma and Beta Densities, Complex Case 


Parallel results can be obtained in the complex domain. If Wi and Ws are statisti- 
cally independently distributed p x p Hermitian positive definite matrices having complex 
matrix-variate gamma densities with the parameters (o, B) and (оо, B), B = B* > O, 
where an asterisk designates a conjugate transpose, then their joint density, denoted by 
fw, W»), is given by 

|det(B)|" +% 


F(W, Wo) = —————— — |det(W;) |"? аео) 2 Pe (8001102) (81.1) 
uA Eg ETE 


for B > O, Wi - 0, ГА > О, Raj) > p— 1, j = 1,2, and zero elsewhere, 
with |det(W;)| denoting the absolute value or modulus of the determinant of Wj. Since 
the derivations are similar to those provided in the previous subsection for the real case, 
the next results will be stated without proof. Note that, in the complex domain, the square 
roots involved in the transformations are Hermitian positive definite matrices. 


Theorem 8.1a.1. Let the p x p Hermitian positive definite matrices W; and М be 
independently ics as complex matrix-variate gamma variables with the parameters 
(от, В) and (a, B), B = B* > О, respectively. Letting U3 = Wi + Wo, 
zs m E M 78 uml a^ dicc E n 

= (Wi + Wo)-2Wi(W + 05) 2 = 0; 20105 2 and 0 = W; WW; >, 
then (1): 03 is distributed as a complex matrix-variate gamma with the parameters 
(a; + a, B), В = В* > О, («у +a2) > p— 1; (2): Ü, and U3 are indepen- 
dently distributed; (3): Ü| is distributed as a complex matrix-variate type-I beta random 
variable with the parameters (a1, o2); (4): Un is distributed as a complex matrix-variate 
type-2 beta random variable with the parameters (a, 2). 


8.1.2. Real Wishart matrices 


Since Wishart matrices are distributed as matrix-variate gamma variables whose pa- 
rameters are o; = Ti, m; = р, and B = 15-1, У > О, we have the following 
corollaries in the real and complex cases: 


Corollary 8.1.1. Let the p x p real positive definite matrices W, and № be indepen- 
dently Wishart distributed, W; ^ W,(mj;, X), with m; = p, j = 1,2, degrees of 
freedom, common parameter matrix X > О, and respective densities given by 
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1 mj 
$;(W;) = E - - IW; е! Wi umo. (8.1.8) 
п) 


and zero elsewhere. Then (1): U3 = № + № is Wishart distributed with m, + тә de- 
grees of freedom and parameter matrix = > О, hari is, Ta Wp (mı +m, X), X > О; 


(2): Ui = (Wi + W) >Wi(Wy + W3)-3 E "wu? is a real matrix-variate type-1 
beta random variable with the parameters (a, ee: da is, Uy ~ Эре 1 СЕ C pce 


(3): Uy and U3 are independently distributed; (4): U = W, АГАЕ is a real 
matrix-variate type-2 beta random variable with the parameters ie o»), that is, V ~ 
type-2 beta (7, 72). 


The corresponding results for the complex case are parallel with identical numbers of 
degrees of freedom, m, and m», and parameter matrix X = X* > О (Hermitian positive 
definite). Properties associated with type-1 and type-2 beta variables hold as well in the 
complex domain. Consider for instance the following results which are also valid in the 
complex case. If U is a type-1 beta variable with the parameters (o, o2), then J — U isa 
type-1 beta variable with the parameters (o5, о) and (J — U )y3 U(I —U y3 is a type-2 
beta variable with the parameters (o, o). 


8.2. Some Eigenvalues and Eigenvectors, Real Case 


Observe that when X, a p x p real positive definite matrix, has a real matrix-variate 
gamma density with the parameters (a, B), B > О, (о) > = Шеп 7 = В?ХВ? 
has a real matrix-variate gamma density with the parameters (o, 7) where / is the identity 
matrix. The corresponding result for a Wishart matrix is the following: Let W be a real 
Wishart matrix having m degrees of freedom and 27 > О as its parameter matrix, that 
is, W ~ У, (т, X), X > О, m > p, then Z = SWE? ~ Wp(m, I), m = p, 
that is, Z is a Wishart matrix having m degrees of freedom and / as its parameter matrix. 
If we are considering the roots of the determinantal equation 14 — АИ | = 0 where 
Wi ~ М, (ті, X) and Й ~ (то, E), X > О, mj > p, j = 1,2, and if Й and 
W» are independently distributed, so will ИЛ = X-2W, X-3 and W2 = 5-2 0 5-2 be. 
Then 


|x-iWix-iax-MWae|-202|z[rW-aWQsp:- 
= |W — AW3| = 0. (8.2.1) 


Thus, the roots of |W; — AW2| = 0 and IW; — А | = 0 are identical. Hence, without 
any loss of generalily, one needs only consider the roots of W;, W; ^ Wp(mj;, I), mj = 
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р, j = 1,2, when independently distributed Wishart matrices sharing a common matrix 
parameter are involved. Observe that 


—1 ol 
|W; —AW2| 20 > |W, ^W1W, > — AI| = 0 (8.2.2), 


1 


—1 E j 
which means that A is an eigenvalue of W, 2 у W, 2 when Wj; m Wpy(mj;, I), j 21,2. 
If Y; is an eigenvector corresponding to the eigenvalue å ;, it must satisfy the equation 
—1 at 
(W, Wi W, OY = ^;Ү,. (8.2.3) 
Let the eigenvalues А ;’s be distinct so that Aj > Ао > --- > Ap. Actually, it can pe shown 
that Pr{A; = àj} = 0 almost surely for all і 5 j. When the орела of №, -2 WiW, T 


are distinct, then the eigenvectors are orthogonal since W, ZW, W, ? is symmetric. Thus, 
in this case, there exists a set of p linearly independent шша о вр] eigenvectors. 


Let Y;,..., Y, be a set of normalized mutually orthonormal eigenvectors and let Y = 
(Yı, ..., Yp) be the p x p matrix consisting of the normalized eigenvectors. Our aim is to 
determine the joint density of Y and A1, ..., Ар, and thereby the marginal densities of Y 
and А1, ..., Ар. To this end, we will need the Jacobian provided in the next theorem. For 


its derivation and connection to other Jacobians, the reader is referred to Mathai (1997). 


Theorem 8.2.1. Let Z be a p x p real symmetric matrix comprised of distinct real 
scalar variables as its elements, except for symmetry, and let its distinct nonzero eigen- 
values be Ар > А > +++ > Xp, which are real owing to its symmetry. Let D = 
diag(41, ..., àp), dD = алл... лал, and P be a unique orthonormal matrix such that 
РР' = І, Р'Р = I, and Z = PDP’. Then, after integrating out the differential element 
of P over the full orthogonal group Op, we have 


ae 
2 


dZ = Fall П 0; -Xp]aD = — nd TCR AphD. (824) 


i<j 


Corollary 8.2.1. Let g(Z) be a symmetric function of the p x p real symmetric matrix 
Z—symmetric function in the sense that g(AB) = g(BA) whenever AB and BA are 
defined, even if AB + BA. Let the eigenvalues of Z be distinct such that Ар > А > > 
Ap, D = diag(A1, ..., Ар) and dD = da, A... ^ лр. Then, 


f sz = | 8D) = n СО plap. 


1<] 
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Example 8.2.1. Consider a p x p real matrix X having a matrix-variate gamma density 


with shape parameter о = p and scale parameter matrix /, whose density is 


1 (х) 
О ае" , Х > О. 
pU 


A variable having this density is also said to follow the real p x p matrix-variate expo- 
nential distribution. Let p = 2 and denote the eigenvalues of X by оо > A; > № > Q. It 
follows from Corollary 8.2.1 that the joint density of Ал and A2, denoted by f1(D), with 
D = diag(Ai, Ал), is given by 


22 
1 T2 
fi(D) dD = (№ = Avvo "aD. 
DC25n) 
Verify that f1 (D) is a density. 


Solution 8.2.1. Since f1(D) is nonnegative, it suffices to show that the total integral is 
equal to 1. Excluding the constant part, the integral to be evaluated is the following: 


oo Л 
J | Ол = A) e 12d, A dA» 
à1=0 74520 


oo А] оо А 
= | ue f edi; ax; -f ff je da |da 
40 =O 20 ig 


оо оо оо 
+ | Ае 24А -f eda, «f e ?dA, 
0 0 0 


оо оо оо 
=] Aje day -f емал + | е ga 
0 0 0 


aoe а (i) 
2 2 
Let us now compute the constant part: 
1 nt Low 
D n DNA) 
1 л? 


_ _, | 
тату утул we 
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Since the product of (i) and (ii) gives 1, fı (D) is indeed a density. 


Example 8.2.2. Considera p x p matrix having a real matrix-variate type-1 beta density 
with the parameters œ = 229 В = aes whose density, denoted by f (X), is 


I 1 
лк _ 


== ‚О < Х < 1, 
DEG) 


and zero elsewhere. This density f (X) is also referred to as a real p х p matrix-variate 
uniform density. Let р = 2 and the eigenvalues of X be 1 > A, > А > 0. Then, the 
density of D = diag(Ai, Аг), denoted by f1(D), is 


22 
(2+1 p 

fi(D)dD = DD EA —23)dD, 122417220, 
25r DG 


and zero elsewhere. Verify that f (D) is a density. 
Solution 8.2.2. The constant part simplifies to the one: 
DG) хл?  JzrQrgG л? 
In@PRO a rO GP varOrg) 
|OJRQ)GGO Y. л? 


—6 (i) 
[VENT Улут 
Let us now consider the functional part of the integrand: 
1 ài 1 132 
| J (Ay — à2) dii лал = | А dAi -f 25d 

4170 ЈАо=0 0 o 2 
Ж 1 
= zS Аал = e (ii) 


As the product of (i) and (ii) equals 1, it is verified that f| (D) is a density. 


Example 8.2.3. Consider a p x p matrix having a matrix-variate gamma density with 
shape parameter o and scale parameter matrix 7. Let D = diag(A1,..., Ар) where оо > 
Ay > À2 >+: Ар > О аге the eigenvalues of that matrix. The joint density of the А. ;’s 
or the density of D is then available as 


N 


р 


LES дла EE e 0a Ay) Les 
fu(D) OD = лили pl е По - 2» [an. 


i<j 
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Even when о is specified, the integral representation of the density f;(D) will generally 
only be expressible in terms of incomplete gamma functions or confluent hypergeometric 
series, with simple functional forms being obtainable only for certain values of œ and p. 
Verify that f1(D) is a density fora = 1 апа р = 2. 


Solution 8.2.3. For those values of р апа a, we have 
7 3 
[4142]? 34 — Ag)e 91 ^2 = [ARAZ — afazje 0129 


whose integral over oo > A, > Az > O is the sum of (i) and (ii): 


оо Лу оо ^ 
J / MAde- O12, A day = J Xe] J Мет? |d 
à1=0 JA2=0 à1=0 A2=0 


oo 
= J [(—a? — 241 — 24e 7^ + 241e ^14, 
Xr 


1=0 
(0) 
оо I 
= J xe J Aye "dia |d 
A120 A2=0 


oo 
= | [ОЛ + ЗА + 6А? + 6ADe 24 — 6A2e Јал, 
0 


that is, (i) 


оо 
| [AT + 443 + 6А2)е 24 + (243 — 612)e ?11dA4 
0 


= 275r (5) + 4275r (4) + 627r (3) + 2r (4) — 6Г(3) 
4 4Q) 60) 15 


- n oenn © us 
25 + 74 + 3 + 2(3!) — 6!) 1 (їїї) 


Now, consider the constant part: 


p? 
1 лт? _ 1 л? 
TDO DA DG) 
1 л? 4 


- ROOT улут 15 wy 


The product of (iii) and (iv) giving 1, this verifies that f1(D) is a density when р = 2 and 
ji 
a= 5. 
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8.2a. The Distributions of Eigenvalues in the Complex Case 
The complex counterpart of Theorem 8.2.1 is stated next. 


Theorem 8.2а.1. Let Z be a p x p Hermitian matrix with distinct real nonzero A 
ues Уу > А mom Ар. Let Qbeapxp unique unitary matrix, OO*=1, 0*0 = 
such that Z = QD Q* where an asterisk designates the conjugate transpose. Then, in 
integrating out the differential element of Q over the full orthogonal group O p. we have 


дРФ-1 


Py) 


dZ П - 4)? fap. (8.2a.1) 


i<j 


Note 8.2a.1. When the unitary matrix Q has diagonal elements that are real, then the 
integral of the differential element over the full orthogonal group Op will be the following: 


— лР(р-1) —— 
(Q) = (8.24.2) 
Ja Py) É 


where h(Q) = лао) Q*]; the reader may refer to Theorem 4.4 and Corollary 4.3.1 of 
Mathai (1997) for details. If all the elements comprising О are complex, then the numer- 
ator in (8.2a.2) will be л” instead of т??—1). When unitary transformations are made 
on Hermitian matrices such as Z in Theorem 8.2a.1, the diagonal elements in the unitary 
matrix Q are real and hence the numerator in (8.2a.2) remains ж?!) in this case. 


Note 8.2a.2. A corollary parallel to Corollary 8.2.1 also holds in the complex domain. 


Example 8.2a.1. Consider a complex p x p matrix X having a matrix-variate type-1 
beta density with the parameters о = p and В = p, so that its density, denoted by f (X), 
is the following: 


е ГО " 
Peo). вт 
Гъ(р)Г(р) 
which is also referred to as the р х р complex matrix-variate uniform density. Let D = 
diag(A1,..., Ар) where 1 > А > А >+- > Ap > O are the eigenvalues of X. Then, the 
density of D, denoted by f1(D), is given by 
Г „(2р) тРФ—1) | 
fi(D) dD ——|[]@:- 4) ] ap. (i) 
С FP Гур) LES 


Verify that (i) is a density for p = 2. 
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Solution 8.2a.1. For Hermitian matrices the eigenvalues are real. Consider the integral 
over (44 = A2)?: 


1 
= | 7 d = z5 (ii) 


Let us now evaluate the constant part: 


Ñ, (2p) лРФ?Р-—1 » P4) 22 

LDE Pp) DRO) 
O лг@г@ Pa, " 
-aMPOTOEzPOrQ ^ ie 


The product of (ii) and (iii) equalling 1, the solution is complete. 


Example 8.2a.2. Consider a p x p complex matrix Х having a matrix-variate gamma 
density with the parameters (a = p, В = I). Let D = diag(A1,..., Ар), where oo > 
A, m ++: > Ap > O are the eigenvalues of 24 Denoting the density of D by f1(D), we 
have 

] лРФ—1 


(pS 
: Pp) Ѓ,(р) 


me +o о, – i 


i<j 


When a = p, this density is the p x p complex matrix-variate exponential density. Verify 
that f1(D) is a density for р = 2. 


Solution 8.2a.2. The constant part simplifies to the following: 


] zPo-D ] az X» 


Г, (р) Pp DAD 
1 л? 2.4 
лГО)Г(а)лтГ2)Га) ^ 


© 
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and the integrals over the А ;’s are evaluated as follows: 


оо АА 
J / ОЛ + эз —2A1A2)e MIMD аА A dag 
A120 ЈА 


2-0 
oo 


оо Л Л 
= | xe [f edi; dA «f xr ме | ал 
à1=0 A2=0 à1=0 A2=0 
оо Ài 
—2 J ме [/ мет | ал. 
à1=0 à2=0 


1= 


oo 
= | (—A? — А2 — 20, — 2-222 + 241)e7™ dA 
оо оо oo 
«f QI 2-2AX)e "dài = | ОЛ — 201 + 2)е аА — jj e ?1gA 
0 0 0 
2 
= TC) ere = (ii) 
Now, taking the product of (i) and (ii), we obtain 1, and the result is verified. 


8.2.1. Eigenvalues of matrix-variate gamma and Wishart matrices, real case 


Let №; and И» be two p x p real positive definite matrix-variate random variables 
that are independently distributed as matrix-variate gamma random variables with the pa- 
rameters (01, B) and (o5, B), respectively. When o; = F, mj > p, j = 1,2, with 
mı,m = р, р + 1,..., and B= jl , Wi and И» are independently Wishart distributed 
with mı and m» degrees of freedom, respectively; refer to the earlier discussion about the 
elimination of the scale parameter matrix 27 > О in a matrix-variate Wishart distribution. 
Consider the determinantal equation 


—1 ad 
|W) - AW3| 20 > |W, ?W1W; ? —al| = 0. (8.2.5) 


_1 1 
Thus, A is an eigenvalue of U2 = W, 2 Wi W, ?. It has already been established in The- 
1 1 


orem 8.1.1 that U2 = W, 2 уру Wy ? is distributed as a real matrix-variate type-2 beta 
random variable with the parameters с and o? whose density is 


Tp (ay + оз) 


ae pag oO, 8.2.6 
Parole) 2l |I + U2| (8.2.6) 


fu(U2) = 


for U2 > О, R(aj) > d j = 1,2, and zero elsewhere. Note that this distribution is 
free of the scale parameter matrix B. Writing U» in terms of its eigenvalues and making 
use of (8.2.4), we have the following result: 
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Theorem 8.2.2. Let À| > А >+: > Ар > Obe the distinct roots of the ота 


equation (8.2.5) or, equivalently, let the А. ;'5 Бе the eigenvalues of U2 = №, " Wi W, 
defined in (8.2.5). Then, after integrating ош over the full orthogonal group Op, ihe pw 
density of 1, ... , Ap, denoted by g\(D) with D = diag(A1, ..., Ap), is obtained as 


Г,(а + о) 


р 
#100) = Pees Ц DH = По jp oo) (8.2.7) 


p2 


T Пе - Ap]dD, dD = dài A... ^ diy. 


Proof: Applying the transformation U2 = PDP’, PP’ = I, Р'Р = I, where P isa 
unique orthonormal matrix, to the density of U2 given in (8.2.6), it follows from Theo- 


rem 8.2.1 that А 
р 


2 
dU, = 20! Пе -5)|4р 


after integrating out the differential element corresponding to the orthonormal matrix Р. 
On substituting |U2| = А ---A, and |] + U5| = (1-F 41) --- (1 3- A5) in (8.2.6), the result 
is established. 


Note 8.2.1. When a, = 51, à? = "2, mj > p, with mj, m? = p, p + 1,..., in 
Theorem 8.2.2, we have the corresponding result for real Wishart matrices having т and 


mz degrees of freedom and parameter matrix H р: 


Example 8.2.4. Let the рх p real matrix X have a real matrix-variate type-2 beta density 
with the parameters a = pil and B — P. Then, the joint density of its eigenvalues 
Ay А>. > Àp > 0, or that of D = diag(41, ..., àp), denoted by g1(D), is 


5 
Dp л? (p+) -— 
90) = т атри +4; PT]; 2). 


Verify that gı (D) is a density for p = 2. 


i<j 


Solution 8.2.4. Consider the total integral for р = 2. The constant part is 
2 
Dp) z* DO л? č smr л? 
eA) [ЪЪ rr JT (0) G) 


_ VEQE \G )ут m? 
л(1)л улул 


= 6, (0 
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and the integral part is obtained as follows: 


Oo e Ол = А) 
3 3dA1 ^ dA5 
m=O 29 (1 + 24)? (E + A2) 
oo À Лу 1 
Р sees] 
меб (1 A) E ses (L + ^2) 


Patol tae © 
LÍ > ЗЕ == % Hi 
vc БОЕ b cq eu qt 


The first integral over A> in (ii) is 


Кес 7D aul 
ipu 2b Gage 


then, integrating with respect to A; yields 


T us \ De а, _ шого Е ког) 

2 Ja,-0 (б + A133 (1 +A)? 2 Г (3) Г (5) 
pde 1} 5 " 
= (un 


Now, after integrating by parts, the second integral over Az in (ii) is the following: 


^1 Лә 1 Л 1 
EX c EX 
ло=0 (1 + A2? 2L (+4) Tp 


then, integrating with respect to A; gives 


Ju 1 А 1 
: py Б 
2 Ji, 20 (1 + A0? (A) I 


IrPQ)rG) 1 1 1 : 
= „| = — — =| = — —, (iv) 
2 Г (5) 2 3 24 
Combining (iii) and (iv), the sum is 
5 1 1l (v) 
24 24 6 " 


Finally, the product of (i) and (v) is 1, which verifies that f(D) is indeed a density when 
pes. 


The Distributions of Eigenvalues and Eigenvectors 563 


8.2а.1. Eigenvalues of complex matrix-variate gamma and Wishart matrices 


A parallel result can be obtained in the complex domain. Let W; and Й» be indepen- 
dently distributed p x p complex matrix-variate gamma random variables with parameters 
(o1, B) and (a2, B), В = В* > О, 9(a;) > р— 1, j = 1, 2. Consider the determinan- 
tal equation 


x T -.-—1l 
[det(W; — AW2)] = 0 2 [det(W, ^W; 


[ж Бы 


— AI)] =0. (8.2a.3) 


2 ышы el 
It follows from Theorem 8.1a.1 that U2 = W, 2 у W, ^ has a complex matrix-variate 


type-2 beta distribution with the parameters (o1, o2), whose associated density is 


"S Г,(о + оо) z _ — 
(05) = PL C |det(U5)|*1-?|det(I + Uy) | te) 8.2a.4 
fu (U2) Та) et(U2)| |det(7 + U2)| (8.24.4) 


for ПА = PR > О, X(aj) > p— 1, j = 1,2, and zero elsewhere. Observe that the 
distribution of 0 is free of the scale parameter matrix B > O and that Wi and W> are 
Hermitian positive definite so that their eigenvalues Aj > --- > Ар > 0, assumed to be 
distinct, are real and positive. Writing U> in terms of its eigenvalues and making use of 
(8.2a.1), we have the following result: 


Theorem 8.2a.2. Let 0 = = wW, АГАЕ PF its distinct eigenvalues Aq > +--+ > Ар > 
0 be as defined in the derini inta equation (8.2a.3). Then, after integrating out the 
differential element corresponding to the unique unitary matrix О, OO* =I, 0*0 =I, 
such that U> = ODO*, with D = diag(A1, ..., àp), the joint density of А\,..., Àp, 
denoted by ё\(Ю), is obtained as 


А l, (œi + o2) Р Е Р —(о+о2) 
Dips = ЛЕР 1+А; 8.2а.5 
EDAD = Е eat tl 4r 2] (8.22.5) 


«(пье 


i<j 


aD ар = алл... лал, 


where 


Ë a) = a T ое у р, (8.2.6) 
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Example 8.2a.3. Let the p x p matrix X have a complex matrix-variate type-2 beta 
density with the parameters (y = p, B = p). Let its eigenvalues be A; > --- > Ар > 0 
and their joint density be denoted by 21(D), D = diag(A1, ..., Ap). Then, 


А D, Qp) л?-) гү? 
D) = Z - i 
SUP T Voy. Fu) LT MEA arale- 


Verify that gı (D) is a density for р = 2. 


Solution 8.2a.3. Since the total integral must be unity, let us integrate out the А. ;’s. The 
constant part is the following: 


Г,(2р) mPP-) AA л? 
LDE P) DDR) 
_ar®rB) л? 
с л? л 


= 12. (i) 


Now, consider the integrals over A, and A», noting that (A; — ho)? = à: + А5 — 2A1AÀ3: 


A7 + A3 – 21А 
a А l 12] алу ^ ал 
à1=0 ЈА= 


o (1 +A1)4(1 + А) 


E zu ERES. Jaa 
Е Ет apse LS 
оо 1 Л А2 
E ] ко a К оозе )8м 
-af T M — da Ја (ii) 
à1=0 (1 СЕ А1)“ A2=0 (1 + À2)4 | 


As they appear in (ii), the integrals over A» аге 


m | | А 
————áÀd4A., = -| 1 — ——— |, Ls 
J (1+ №) т Al (1+ md (iii) 


The Distributions of Eigenvalues and Eigenvectors 565 


А a | H а | (iv) 
EE = = = — —— |, iv 
peg? ЭР (AP FAF [к 


м0 ПА) 2 3l ИЖА) 2 20 +a? 
then, integrating with respect to A; yields 


= оз a | 
к: Taral- qux) f 


_ о Е p Е 
3L rA POS { 
m 1 1 Ni ài 1 
AE n xa) “Cop Gea Dade 
Е M Pra rara) гоол (vii) 
sL r Г (6б) Г (4) Г(5) F 
99 Л 1 Л 1 1 1 
2} a od bs (+A) Ta 2 ries 
_ A гого) 2 (viii) 
3 Г) Г (4) Г (6) 
Summing (vi),(vii) and (viii), we have 
Irr ГОГА) rere. 174 1 1 1 | 
| га Гу r( |= T UA a -p % 


As the product of (ix) and (i) is 1, the result is established. Note that since 2p is a positive 
integer, the method of integration by parts works for a general p when the first parameter 
in the type-2 beta density o is equal to p. However, | |;_ jAi — ^j) will be difficult to 
handle for a general p. 


Example 8.2a.4. Give an explicit representation of (8.2a.5) for р = 3, о = 4 and 
a2 = 3. 


Solution 8.24.4. For p= 3, a} - р= 4-3 = 1, aj} +œ: = 4+3 = 7, p(p—1)=6. 
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The constant part is the following: 


Губа +o) лР-0 ÅO л?0) 
Г, (ол) Р, (оо) Pop) BAD) 133) 
_ Г(Т)УГ(6)Г (5) 
~ PIP A4rEB)F @][Г(3)Г(@2)Г(1)] 
6 
л 


хус = 43200, (i) 
л3Г(3)Г(2)Г(@) 


the functional part being the product of 
P p 
(П) (Па 20799) = xata 2300 220 tar? 6 
j=l j=l 


and 
|] [Qi — 9)? = Ga — 22? 094 — 23 02 — 3X". (iii) 
i<j 

Multiplying (7), (її) and (iii) yields the answer. 

8.2.2. An alternative procedure in the real case 


This section describes an alternative procedure that is presented in Anderson (2003). 
The real case will first be discussed. Let W; and И» be independently distributed real p x p 
matrix-variate gamma random variables with the parameters (o1, B), (оо, B), B > 
О, R(aj) > r: j = 1,2. We are considering the determinantal equation 


|W) — AW5| = 0 > |W; — u(Wi + W2)| = 0 
=> | — =s W2| =0 (8.2.8) 
1—u 


= |(И + Wa) 32Wi (Wi + W2)7? — I| =0 


where А = 1E. Thus, 4 is an eigenvalue of {Л = (Wi + W3)-3 Wi (Wi + W3)-3. It 
follows from Theorem 8.1.1 that he joint density of W; and W2, denoted by f (Wi, W2), 
can be written as 


f (Wi, W2) = gQU1)g1(U3) (8.2.9) 
where U1 = (Wi + W2)-2 (И + W2)72 and U3 = ИЛ + W2 are independently dis- 
tributed. Further, 

Ty (a1 + 02) 


Г,(о1)Г тш. – је, 0 < <I, (8.2.10) 
р р 


8(01) = 
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for (0 ;) > с, j = 1, 2, and g(U1) = 0 elsewhere, is a real matrix-variate type-1 beta 
density, and 
| |2 


В pl 
g1(U3) = ————_|U3| t2- T e 9G Us) (8.2.11) 
I, (a + a2) 


for B > О, Ў(а +оо) > Р. and g;(U3) = О elsewhere, is a real matrix-variate gamma 
density with the parameters (a; + a2, B), B > О. Now, consider the transformation 
U; = PDP’, PP' = I, P'P = I, where Р = diag(ui, ..., Up), H1 295 мр > 0 
being the distinct eigenvalues of the real positive definite matrix U1, and the orthonormal 
matrix P is unique. Given the density of U, specified in (8.2.10), the joint density of 
I, -.-, Шр, denoted by g4(D), which is obtained after integrating out the differential 
element corresponding to P, is 


p р 
бар = шкы Ца TI ca] 
? j=l je 


р (02) 


ial По“ - ШШ (8.2.12) 


Hence, the following result: 


Theorem 8.2.3. The joint density of the eigenvalues ui > --- > шр > О of the determi- 
nantal equation in (8.2.8) is given by the expression appearing in (8.2.12), which is equal 
to the density specified in (8.2.7). 


Proof: It has already been established in Theorem 8.1.1 that Л = (W1 + W3)72 Wi (И + 
W3)-3 has the real matrix-variate type-1 beta density given in (8.2.10). Now, make the 
transformation U; = PDP’ where D = diag(ui, .... ш p) and the orthonormal matrix, 
P is unique. Then, the frat part is established from Theorem " 2.2. It follows from y vA Ж 


that A = TEE or u = туу With du = aap and м = 1 — туу. Observe that |]; 


ш) = П; оар and that, in this product’s mo 1 4-Aj appears p —1 times 


i<j (ЖА) (ФА) 
fori = 1,..., p. The exponent of n is oq — ae +a2— mE +2+(p-— 1) = а +оо. 
On substituting these values in (8.2. 12), a персі КАЕ. with (8.2.7) is established, 


which completes the proof. 


tal 


Example 8.2.5. Provide an explicit representation of (8.2.12) for p = 3, a; = 4 and 
02 = 3. 
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Solution 8.2.5. Note that Z= = 371 = 2, ор ZH = 42 = 2 and o» — PH = 
2 2 2 2 
3 — 2 = 1. The constant part is 


N 


Doro) лт DBM л? 
Tyo) Г (оо) ГЬ($) — 500130) BG) 
69606) 600 0 GO CO ут 
z3 89900960) 0D /x 21) zt 
9 
л? 


x E CI a se = 
ONERE] 


the functional part being the product of 


= 831600, (i) 


Р р+1 as pil 
(IL? )(Па- 25975) = олиги) = nd = 420 = мз! Go 


| [i = ну) = (ил — ш)(ш — 43) (ua — из). (iii) 
i<j 
The product of (i), (ii) and (iii) yields the answer. 


8.2.3. The joint density of the eigenvectors in the real case 


In order to establish the joint density of the eigenvectors, we will proceed as follows, 
our starting equation being |W; — AW2| = 0. Let A; be a root of this equation and let Y; 
be the corresponding vector. Then, 


W;Y; = А ИУ 2 (Wi БА) = Ау (Wi + УУ, (8.2.13) 
rj Р 
2 WiY; = 7 Wi + WDY; = (И + WDY; (i) 
+i; 
=> (Wi + И) !WiY; = шуу. (ii) 


This shows that Y; is the eigenvector corresponding to the eigenvalue u; of (W; + 
W3)-!Wi or, equivalently, of (W1 + W3)-2 Wı(Wı + W3)-3 — U which is a real matrix- 
variate type-1 beta random variable. Since the р ;’s are distinct, ші > --- > иу > 0 and 
the matrix is symmetric, the eigenvectors Y1,..., Y, are mutually orthogonal. Consider 
the equation 

WY; = uj (Wi + W2)Y;. (iii) 
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For i = j, we also have 
WiYi = wi(W + W2)Yi. (iv) 


Premultiplying (iii) by Y/ and (iv) by Y p and observing that W; = Wi, it follows that 
(Y; Wi Y = Ү, W1Y;. Since both are real 1 х 1 matrices and one is the transpose of the 
other, they are equal. Then, on subtracting the resulting right-hand sides, we obtain 


0 = pj Y; (Wi + W2)Y; — ui Y; (Wi + W2)Y; = (и; — ш); (Wi + ИО) У. 


Since Y'(W; + W)Y; = Yi Wi + W2)Yi by the previous argument and u; Z uj, we must 
have 
Y; (ү + W2)Y; = О for alli z j. (v) 


Let us normalize Y; as follows: 

Y;(Wi + W2)¥; = 1, Jed xax. (vi) 
Then, combining (v) and (vi), we have 

Ү' (УЙ + WY = 1, Y = (Y1, cons Yp), (vii) 
which is the p x p matrix of the normalized eigenvectors. Thus, 


Wi + Wz = OY = Z'Z, Z = Y7! 
=> dY = |7| ?Ра7 or dZ = |Y | ??dY, 
which follows from an application of Theorem 1.6.6. We are seeking the joint density of 


Y|,..., Yp or the density of Y. The density of W; + W2 = Us denoted by 21(U3), is 
available from (8.2.11) as 


| B|oite2 tr(B U. 
а) = тт pap ster е Xx 
р 
+ 
E "EN |Z/ zeit F 6008227) 40717). (8.2.14) 
р О] a2 


Letting U3 = 7/7, ascertain the connection between the differential elements dZ and dU 
from Theorem 4.2.3 for the case g = p. Then, 


РА 
a 
ga dp ed 


IZ'Z|3dZ. 
D»(5) 


r,(8) 
р2 


т? 
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Hence, the density of Z, denoted by g,(Z), is the following: 


| Biwi +e2 IG) 


I, (a@ + a2) a 


ка ЧЛ = |Z’ Z |e te2— Fett BZ'Z) 47 


so that the density of Y = Z~!, denoted by g5(Y), is given by 


[eres ~2p) yy!|—(a tan) +2 ,—tr(B(YY!)-1) 
ОЕ ga гт ИІ YY" | Ze dY 
p [54] a2 
BI y yr —(artart+$)Q—tr(BYY) gy 
_ тетт, (8.2.15) 
0, elsewhere. 


Then, we have the following result: 


Theorem 8.2.4. Let W; and № be independently distributed p x p real matrix-variate 
gamma random variables with parameters (a1, B), (a2, B), В > О, Haj) > 


EE j = 1, 2. Consider the equation 


[Wi —4W5| = 0 > Wi Y; =AjWoY;, j=1,..., p, 
where Y; is a vector corresponding to the root X; of the determinantal equation. Let 


Ay > +++ > Ар > O be its distinct roots, which are also the eigenvalues of the ma- 


1 1 
trix Wy ?^Ww, W, 2. The eigenvalues A1, ...,À p and the linearly independent orthogonal 
eigenvectors Y\,..., Y, are independently distributed. The joint density of the eigenval- 
ues A1, ..., Ap is available from Theorem 8.2.2 and the joint density of the eigenvectors is 
given in (8.2.15). 


Example 8.2.6. Illustrate the steps to show that the solutions of the determinantal equa- 


ER = 
tion [И — AW2| = 0 are also the eigenvalues of W, ? W1W, * for the following matrices: 


3 1 x 1 
"-[ j| Ji 


Solution 8.2.6. First, let us assess whether Wi and W are positive definite matrices. 
Clearly, № = Wi апа № = Ws are symmetric and their leading minors are positive: 


I(2)] = 2 > 0, 


12 1 3 


а | =3 > 05 m> О; |(3)| > 0, 


| =%>0ә m > 0. 
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Consider the determinantal equation |W; — АЙ | = 0, that is, 


3 1 2 1]|_ 5 p 
|; JEDE NN —(-2) 20, (i) 


whose roots are A; = 2, А = 4. Let us determine Wy . To this end, let us evaluate the 
eigenvalues and eigenvectors of Wo. The eigenvalues oF W» are the values of v satisfying 
the equation | W; —v1| = 0 = (2—v)? — 1? = 0 2 v, = 3 and v; = 1 are the eigenvalues 
of W2. An eigenvector corresponding to v; = 3 is given by 


2 = v 1 ХІ 0 
P5" anual notet 


is one solution, the normalized eigenvector corresponding to v; = 3 being Yı whose 
transpose is Үү = Au. 1]. Similarly, an eigenvector associated with vo = 1 is x1 = 


TS 


І, x? = —1, which once normalized becomes Y? whose transpose is Y. = ll. - 1]. Let 
A = diag(3, 1) be the diagonal matrix of the eigenvalues of W2. Then, 


1{1 1/|[3 offi 1 
нане E) [ | Í 20 


Observe that Wo, W, 2 w2 and W, "ne the same eigenvectors Ү and Y2. Hence, 
Lo Ip alfy 0][1 1 
2 — = 
LEE Alles ЧЕ м 
- dp 1]|-— olm 1 
2 _ Уз 
"o 2 [| | | 0 l l M 


and 


TESI 


Е 1 20 —4 
—.12|-4 20|` 
The eigenvalues of T are b times the of (20 — 8)? — 4? = 0 = 6, = 24 and 
д2 = 16. Thus, the eigenvalues of T аге + = 2 and 19 = 4, which аге the solutions of the 
determinantal equation (i). This verifies the result. 
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8.2a.2. An alternative procedure in the complex case 


Let W; and № be independently distributed p x p a nd matrix-variate gamma 
random variables with the parameters (o, B), (a2, B), = B* > О, 9Ó(a;) > 
p— 1, j = 1,2. We are considering the roots 4;’s and the eee eigenvectors 
Y;’s of the determinantal equation 


det(W; —AW2) = 0 > WiY; = AjWAY;, Jesu op 
mace. oue den oe 
= det(W, "Wi W, ? — AI) = 0. (8.2a.7) 


1 1 
Let the eigenvalues A1, ..., Àp of №, ? №! W, ? > О be distinct and such that A; > --- > 
Ap > 0, noting that for Hennigan matrices, the eigenvalues are real. We are interested 
in the joint distributions of the eigenvalues A1,..., Ар and the eigenvectors Y КОЛО Y р: 
Alternatively, we will consider the equation 


det(W, — u(W; + W3) 20 2 de (W, = PD LAG ee Ix (8.2a.8) 


Proceeding as in the real case, one can observe that the joint density of Wi and Wo, 
denoted by i (Wi, Wa), can be factorized into the product of the density of 0} = 
(WA + W5)- 2 Wi (W4 + W3)- 2 , denoted by 200 1), which is a complex matrix-variate type- 
1 beta random variable with the parameters (o1, o), and the density of U3 = Wi + Wa, 
denoted by 21 (U3), which is a complex matrix-variate gamma density with the parameters 
(o1 + o, B), B > O. That is, 


f (Wi, Wo) dW) A аў» = 4001) 1003) dU л аб» (8.24.9) 
where " 
~ r E z 
добу) = Eto) ерер det — 0) |22-? (8.2a.10) 
Tp (a1) Lp (оэ) 


for O < Ü, <I, (оу) > p—1, j= 1,2, and g = 0 elsewhere, and 


7 det( B) [1792 " ae 
Ф003) = че? ——|децйз)!+®'—Ре-«@ 9 (8.24.11) 
Гь (a; +a 


for U3 = Ux > О, 9i(o1 +оо) > p — 1, and B = B* > O, and zero elsewhere. 


Note that by making the transformation 0 = QDQ* with ОО* = I, Q*Q = I, 
and D = diag(111,..., шр), the joint density of u1, ..., р, as obtained from (8.2a.10), 
is given by 
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jo(D) aD = lint an n us Па = п) 
j=l j=l 


Dy (a1) Ip (оз) 
лтР\Р—1) | 2 
а (ui — шу) ар; (8.22.12) 
Fs) П 
also refer to Note 8.2а.1. Then, we have the following result on observing that the joint 
density of à1,..., Àp is available from (8.2a.12) by making the substitution А = Dum or 
1 1 
и = тї Note that du; = Tra A. Li = Tu; and 
(Ai — AY 
(ui = Mj)” = 
[es 7» = aaa pa sap? 
1<] 1<1] 
whose шакша contains p — 1 times (1 + А 2° foreach j = 1,..., р. Thus, the final 
exponent of ——— ic is (01 — р) + (a2 — p) +2 + 2(p — 1) = о + о». Hence, the following 
result: 
Theorem 8.2a.3. In the complex case, the joint density of the eigenvalues 4, > +++ > 


Шр > О of the determinantal equation in (8.2a.8) is given by the expression appearing in 
(8.2a.12), which is equal to the density specified in (8.2а.5). 


We now consider the joint density of the eigenvectors Y [з л» Y p» Which will be avail- 
able from (8.2a.11), thus establishing that the set of eigenvalues A1, ..., А and the eigen- 
vectors Yi, ..., Y p are independently distributed. For determining the joint density of the 
eigenvectors, we start with the equation 

WiY,— (Йй -- Wo)Y,, j 51,5, (i) 


observing that A; and jz; share the same eigenvector Y j. That is, 


WiY; = wi(Wi + W)Y;, i=1,..., p. (ii) 
We continue as in the real case, showing that Y*(Wi + W3)Y у = О forall i Æ j. Then, we 
normalize Y; as follows: Y? (Wi--W5)Y; el ле Letting Y = (Yi, c» P2 be 


the p x “р matrix of the normalized eigenvectors, we have Y *(Wy + Wo)¥ = —-I12U E 
W; + W = (Р) (Ӯ)! . Letting (Y)! = Z so that Z*Z = (YY*)~! and applying 
Theorem 4.2a.3, d(Z*Z) = na ) dZ. Hence, given the density of W; + W2 specified in 
(8.2a.11), the density of Z, denoted by £32), i is obtained as 

Idet(B)|^i** Г (р) 


® \деї(2*2)|®!+®—Ре—Ч(82*2542. (8.2а.13) 
I, (a, + 02) лР(рР—1) 


ё3(2)47 = 
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Now noting that 7 = Ý! = 47 = |det(Y*Y)|-?dY from an application of Theo- 
rem 1.6a.6, and substituting in (8.2a.13), we obtain the following density of Y, denoted by 
g4(Y): 

idet(B)|"1* Php) 
Г, (о + ay) zx PO-D 


a4(Y)d¥ = |det(Y Y*)|-«1792e-*07) "Y Day (8.24.14) 


for B = B* > О, Ray + a2) > p — 1, and zero elsewhere. 


Example 8.2a.5. Show that the roots of the determinantal equation det(W; — АЙ) = 


eee ОРУ 
0 are the same as the eigenvalues of W, ^W1W, ^ for the following Hermitian positive 
definite matrices: 


э о: PET]. у. _ 3 J2(1 +i) 
We "n 3 | 


Solution 8.2a.5. Let us evaluate the eigenvalues and eigenvectors of W». Consider the 
equation det(W2 — uI) = 0 = (3— ш)? — 22 = 0 => ш = 5and иә = 1 are the 
eigenvalues of W2. An eigenvector corresponding to ш = 5 must satisfy the equation 


E ЕЕН cu ош 
Ма—) 3—5 ||х| V20 —i)xi-2x) =0` 


Since it is a singular system of linear equations, we can solve any one of them for x; and 
x2. For хо = 1, we have x; = z + i). Thus, one eigenvector is 


1 Н 1 s 
5 -= (1 E ~ 1 | -=a 
Хү = Vi! +H) = XjX,=25> Ү = — л! no 
1 af? 1 
where Y is the normalized eigenvector obtained from Х|. Similarly, corresponding to the 


eigenvalue u2 = 1, we have the normalized eigenvector 


1 Е 
ў 1 Е 


| апа WalY;, Yo] = LYi, Р] [ | 


so that 


E (1 +i) -z +i) 5 0 1—) 1 
_|2 2 
"e j^ 1 1 ik | Е | 
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observing that the above format is Wa = Y DY* with Y = [Y. 1; Ӯ] and D = фае (5, 1), 
1 1 


the diagonal matrix of the eigenvalues of И. Since Wo, М2, W3 l and W, ? share the 
same eigenvectors, we have 


zl (1 +i) -50 +i) м5 0 (d—i) 1 
2 = 3/4 1 Il JE uw i 


win! 50+) -®ъа+ю]|- 0 21-0 1 
2 1 1 Жүр 


2 
1 D 
Zi "due Ee a 5 
(75 - 0550 - i) xcd 


It is easily verified that 


ой rm E +41 (doy eae 
Uo Weal: *°, x E | 
EE, Jem atl 
atl (35 - 050 +i) 
(Сет (=й Atl 
-i| 3 -V21 +D] A 
5 | —/2(1 — i) 3 2 
tale ye о 1. 
Letting О = W, ^W1W, ^, we have 
-i| atl (5 - Dz +i) ind 
4 ba - i) ztl { 3 
1 
a сй n 
7905 i) atl 
12 . 4 š 
_! g VELLE R | (ii 
4 Ba -D4 40 D £ 
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The eigenvalues of 4Q can be determined by solving the equation 


(© -»)'-[- 42a «04 5a -o][ - 22a 40 a -0] 20 


5 V5 М5 
62 2 42 " 
(s -») = йеэ 
4 
v= 5° + 4490). (iii) 
Thus, the eigenvalues of Q, denoted by 4, are 
1 
ô = S EE AO): (iv) 
Now, let us consider the determinantal equation Де — AW) = 0, that is, 
3(1 — A) (1 —i) 2 AJ2(1 +i) _0 
(1 +i) = AJ2(1— i) 3(1 — A) ss 


which yields 
3? A - +0) - A20 – DIE — i) —av20 + 0] 202 
3*0 5 Ay Em 2A) + A + v2) = 0 > 
A= = + /46). (v) 
The eigenvalues obtained in (iv) and (v) being identical, the result is established. 
8.3. The Singular Real Case 


If a p x p real matrix-variate gamma distribution with parameters (v, 8) and B > О 
is singular and positive semi-definite, its p x p-variate density does not exist. When a = 
25 m > p,and B= > -—1 5 > О, the gamma density is called a Wishart density with 
m degrees of freedom and parameter matrix X > О. If the rank of the gamma or Wishart 
matrices is r « p, in which case they are positive semi-definite, the resulting distributions 
are said to be singular. It can be shown that, in this instance, we have in fact nonsingular 
r x r-variate gamma or Wishart distributions. In order to establish this, the matrix theory 
results presented next are required. 


Let A = A’ > O (non-negative definite) be a p x p real matrix of rank r < p, and the 
elementary matrices E1, E2, ..., Ек be such that by operating on A, one has 


1, A 


Ip jy 
Ek: E2E1 AE E, E, = E 0 
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where О, О» and Оз are null matrices, with Оз being of order (p — r) x (p — ғ). Then, 


и 2111 OL) оет 21. L Оу, 
uni и B Е и =o "E 


where Q is a product of inverses of elementary matrices and hence, nonsingular. Letting 


@= ps "I Q1 being r x r and Qo, (p —r) x (р-р), 
Оз Qo 


Q б o. = bs dud b | _ p | 
O2 Оз Ол Qo||O О Qn OJ’ 
Note that 
L О 1 0110), 0101 = Оп / гт / 
Q E ol Q = Ке 25] 7 КІ [Qi Qn] = A141 


Оп 
Q21 


A, are all linearly independent. This result can also be established by appealing to the fact 
that when A > О, its eigenvalues are non-negative, and A being symmetric, there exists 
an orthonormal matrix P, PP’ = I, P'P = I, such that 


where A, = | | which is a full rank p x r matrix, r < р, so that the r columns of 


D O 
о о 


D = diag(à1, ..., àr), Pœ = [Pi,..., Р], P = [P1, ..., Pr, ..., Pp], 


A=P | | P! = M PiP] +++ Ar PiP) + OP 41 P) +: +0РрР, = Pa DP), 


where Ру, ..., Pp are the columns of the orthonormal matrix P, A1,...,A,;,0,...,0 are 
the eigenvalues of A where A; > 0, j = 1,...,7, and Pa) contains the first r columns 
of P. The first r eigenvalues must be positive since A is a non-negative definite matrix of 
rank r. Now, we can write A — Pa) D Piy = A1A!, with A; = Pa DÀ. Observe that A; 
is p x r and of rank r « p. Thus, we have the following result: 


Theorem 8.3.1. Let A = A’ be areal px p positive semi-definite matrix, A > O, of rank 
r < p. Then, А can be represented in the form A = АА} where Aj isa p xr, г < p, 
matrix of rank r or, equivalently, the r columns of the p x r matrix A, are all linearly 
independent. 


In the case of Wishart matrices, we can interpret Theorem 8.3.1 as follows: Let the рх 1 
vector random variable X ; have a nonsingular Gaussian distribution whose mean value is 
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the null vector and covariance matrix 27 E oue definite. Let the X ;'s, j = 1,...,n, 
be independently distributed, that is, X ; “N, (О, У), X > О, j = 1,..., п. Letting 
the p x n sample matrix be X = [X1,..., X E the joint density of X1, .. X, or that of 
X, denoted by f (X), is the following: 
1 !y-lx, 
fX = те 2571 XE X gx 
(277) 2 | d'|2 
- ae jr XX)ax (8.3.1) 
(2л) 2 |X|? 


Letting W = XX’, the p x р matrix W will be positive definite provided n > р; otherwise, 
that is when n < p, W will be singular. Let us consider the case n > p first. This will also 
provide a derivation of the real Wishart density which was earlier obtained as a special 
case of real matrix-variate gamma density. Observe that we can write dX in terms of dW 
by applying Theorem 4.2.3, namely, 


n zp 
л? 


Ip) 


ах = 


iw|3- aw. 


Therefore, if the density of W is denoted by f1(W), then f1(W) is available from (8.3.1) 
by expressing dX in terms of dW. That is, 


ЛО) = wise -i 1!) (8.3.2) 
2? D'5(5) Zl? 


frn > p W > О, X > О, and fi(W) = 0 elsewhere. This is the density of a 
nonsingular Wishart distribution with n degrees of freedom, n > p, and parameter matrix 
X > О, which is o W ~ Wn, E), X > О, n = p. It has previously been 


shown that when X ; m Np(u, E), j = 1,...,п, where и Æ О is the common p x 1 
mean value vector and X is the positive definite covariance matrix, 


W = (X—X)(X-X)'~W,(n-1, X), X > О forn—-1> p, 


where X = [Х, Te X], X= 1(Xı + ::: + Xn). Thus, we have the following result: 


Theorem 8.3.2. Let Xj 2 М›(и, X), X > О, ] = 1,...‚,п.1еїХ = [Х\,...‚Х»] 
апа W = XX’. Then, when ш = О, the p x p positive definite matrix W ~ 
W p(n, x) X > Oforn> р.и  O,W = = X)X-X)-—W,n-1l, X), X > 
О, forn—1> р, where X = [X,..., X], X = +(x, Tc Xp). 
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Now, consider the сазе п < p. Let us denote л asr < p in order to avoid any confusion 
with n as specified in the nonsingular case. Letting X be as previously defined, X is a real 
matrix of order p x r, n =r < p. Let Ti = XX’ and T) = X'X where X'X is anr xr 
positive definite matrix since X and X' are full rank matrices of rank r < p. Thus, all the 
eigenvalues of Т» = X'X are positive and the eigenvalues of Т are either positive or equal 
to zero since Тү is a real positive semi-definite matrix. Letting A be a nonzero eigenvalue 
of Тә, consider the following determinant, denoted by ô, which is expanded in two ways 
by making use of certain properties of the determinants of partitioned matrices that are 
provided in Sect. 1.3: 


МЭЛ X zi 
de p = AI Al, —X' (Alp) X 
x^ Jn = Mol IA (VAI) X| 
= (VAP TIAL — X'X|; 
ô = 0 > [AI —X’X| = 0, (8.3.3) 


which shows that л is an eigenvalue of X'X. Now expand 6 as follows: 


8 = |VAL| МА, — X(V AI) Х| 
= (VA) PH АІ, — XX"; 
ô = 0 > A, — XX'| = 0, (8.3.4) 


so that all the r nonzero eigenvalues of 7; = X'X are also eigenvalues of T; = XX’, the 
remaining eigenvalues of Тү being zeros. As well, one has |Z, — X'X| = |/ pa XX'|. These 
results are next stated as a theorem. 


Theorem 8.3.3. Let X bea p x r matrix of full rank r « p. Let the real p x p positive 
semi-definite matrix Тү = ХХ! and the т х т real positive definite matrix T; = X'X. Then, 
(a) the г positive eigenvalues of T» are identical to those of Ti, the remaining eigenvalues 
of Ту being equal to zero; (b) |I, — X'X| = |1, — XX’. 


Additional results relating the p-variate real Gaussian distribution to the real Wishart 
distribution are needed in connection with the singular case. Let the p x 1 vector X ; have 
a p-variate real Gaussian distribution whose mean value is the null vector and covariance 


matrix is positive definite, with X ; = N (0, >), X > О, j= 1,...,r, т < p. 
Let X = [X;,..., X,] bea p x r matrix, which, in this instance, is also the sample 
matrix. We are seeking the distribution of Tj; = XX’ when r < p, where Тү corresponds 
to a singular Wishart matrix. Letting T be an r x r lower triangular matrix with positive 
diagonal elements, апа G be an r x p, r < p, semiorthonormal matrix, that is, GG’ = I,, 
we have the representation X' = TG, so that 
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x = | П: Pilar ACG), р>, (8.3.5) 


where Л (С) is a differential element associated with G. Then, on applying Theorem 4.2.2, 
we have 


ne 


nitore a5 Я 8.3.6 
[ИСС ngy Pes (8.3.6) 


where V, p is the Stiefel manifold or the space of semi-orthonormal r x р, r < p, matrices. 
Observe that the density of X’, denoted by fx (X^), is the following: 


ir 


жик X,z-1X; e7 dX’ E-!X) 
fx X^) dX! = dX’ = + dX. 
(2л)? || (2л) ||? 


Let Tə = X' Z-!X or simply Т = X'X when X = I; note that X will vanish upon letting 
Y= УХ > dY = | Z|-2dX. Now, on expressing dX’ in terms of dT) by making use 
of Theorem 4.2.3, the following result is obtained: 


Theorem 8.3.4. Let X; A ми, X), X > О, ] = 1,...,г, r < p. Let X = 
[Х\,...,Х„], X = Lx, Ae ev Nand X = (X,..., X). Yonne Т» = X XX or 
Т = X'X when X = I, Th m the following density, denoted by f; (Т), when u = О: 


1 
AO») = RULES mia Т> О, г< р, (8.3.7) 


and zero elsewhere. Note that ће г х г matrix T = Х'Х or Т = X'D~'!X when X 4 I, 
has a Wishart distribution with р degrees of freedom and parameter matrix I, that is, 
Т ~ W,(p, D, к € p. When и # О, T; = (X — XY 7! (XK — X) has a Wishart 
distribution with p — 1 degrees of freedom or, equivalently, Т ~ W,(p—1, I), r x р- 1. 


8.3.1. Singular Wishart and matrix-variate gamma distributions, real case 


We now consider the case of a singular matrix-variate gamma distribution. Let the 
р х 1 vector X; have a p-variate real Gaussian п ae mean value is the null 


vector and covariance matrix is positive definite, with X ; M N, 4 У), 5 >O, j= 
1, ‚т, r < p. For convenience, let X = Ip. Let X = CX, ..., Xp] be the p x r sample 
indt Then, for r > p, XX’ is distributed as a Wishart matrix with r > p degrees of 
freedom, that is, XX’ ~ Wp(r, I), r = p. This result still holds for any positive definite 
matrix X; it suffices then to replace J by X. What about the distribution of XX’ if r < p, 
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which corresponds to the singular case? In this instance, the real matrix ХХ” > О (positive 
semi-definite) and the density of X, denoted by / (Ж), is the following: 


— 5tr(XX’) 


fi (X) dX = “tar (8.3.8) 


Let W2 be a p x р nonsingular Wishart matrix with n > p degrees of freedom, that is, 
W2 ~ М, (п, I). Let X and Wz be independently distributed. Then, the joint density of X 
апа W2, denoted by f2(X, W2), is given by 


—5tr(XX’+W2) : 
Р(Х, №) = ТЕГ IWo|? Z 
(27) 222 Г(5) 


for W2 > О, XX’ > О, п > р, т < p. Letting U = XX’ + № > O, and the joint 
density of U and X be denoted by f3(X, U), we have 


—5tr(U) 


BX, U) U- 5 п> p, r <p, 


pr np 


С Qxyf2* ГДЕ) 


where | | 
|U — ХХ'| = |U| |I —U ?XX'U 2 |. 


Letting V = U -2X for fixed U , dV = |U|2dX, and the joint density of U and V, denoted 
by fa(U, V), is then 


p= Irvin. (8.3.9) 
Ол) 2 г,09) 


Note that U апа V are independently distributed. Ву integrating out U with the help of a 
real matrix-variate gamma integral, we obtain the density of V, denoted by f5(V), as 


p+1 
, 


fs(V) SeM ҮЙ = eur — vrvit-5 (8.3.10) 


in view of Theorem 8.3.3(b), where V is p x r, r « p,c being the normalizing constant. 
Thus, we have the following result: 

Theorem 8.3.5. Let X = [X,,..., X,] and Xj; a NAO. I), J = lysate r < p. 
Let the p x p real positive definite matrix W» be Wishart distributed with n degrees of 
freedom, that is, W2 ~ Wy(n, I). Let U = XX’ + W2 > O and let V = U-?X where V 
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isa р хт,т < p, matrix of full rank r. Observe that V V' > О (positive semi-definite). 
Then, the densities of the r x p matrix V' and the matrix S = V'V, respectively denoted 
by fs(V) and f6(S), are as follows, U and V being independently distributed: 


n pim 
fs(V) = (2) C22 р-у, (8.3.11) 
л? DATE) 
and 
TEF) рг pir eta 
fo(S) = [КТ — S|, n> pr <p, 5 > О. (8.3.12) 


D, (5), (=) 


Proof: In light of Theorem 8.3.3(b), |/; — VV'| = |I, — V'V|. Observe that VV" is 
p x p and positive semi-definite whereas V'V is r x r and positive definite. As well, 
5 — ptt — => +” — au . Letting S = V'V and expressing dV’ in terms of dS by applying 
Theorem 4.2. 3, then for r<p, 


rp 


л p_r+l 
dV’ = |512 7 dS. (i) 
Г,(5) 
Now, integrating out S by making use of a type-1 beta integral, we have 
TM (One 
/ Кк е ae ED X (ii) 
5>0 Г.С) 


forn > р, г < р. The normalizing constants in (8.3.11) and (8.3.12) follow from (i) and 
(ii) . This completes the proofs. 
At this juncture, we are considering the singular version of the determinantal equation 


in (8.2.8). Let И; and №» be independently distributed p x p real matrices where М = 
iid 


XX’, X = [X1,..., X,] with X; ^ N5(O, I), j = l,...,r, r < p, and the positive 
definite matrix W2 ^ W,(n, I),n = p. "The equation 
[Wi — A (Wi + W2)| = O = |XX’ — u(XX' + W2)| 20 
=> lU-?XX'U-3 — ш =0, U = XX’+W2>0O 
= |VV’ — ply| = 0, 


which, in turn, implies that и is an eigenvalue of VV’ > О and all the eigenvalues are 
positive or zero. However, it follows from (8.3.3) and (8.3.4) that 


IVV’ — uI5| 2 0 > |V'V — u1,| = 0. 


Hence, the following result: 
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Theorem 8.3.6. Let X be a p xr matrix whose columns are iid Np(O, I) andr < p. Let 
W, = XX’ > О which is a p x p positive semi-definite matrix, W2 > O be a px p Wishart 
distributed matrix having n degrees of freedom, that is, № ~ Wy(n, I), n = p, and let 
W, and № be independently distributed. Then, U = Wi + W2 > O and V = U -1X are 
independently distributed. Moreover, 


IWi — (Wi + W5)| =0 > |V'V — ul | = 0 


where the roots 4; > 0, j = 1,...,r, are the eigenvalues of V'V > O, and the eigen- 
values of V V' are шу > 0, j = 1,..., р, with the remaining p — r eigenvalues of үү’ 
being equal to zero. 


Let S = PDP’, D = diag(ui, ..., Ur), PP’ = I, P'P = I,. Then, on applying 
Theorem 8.2.2, 


“з 


ЖБП, uo 


i<j 


ón 


after integrating out over the differential element associated with the orthonormal matrix 
P. Substituting in /6(5) of Theorem 8.3.5 yields the following result: 


Theorem 8.3.7. Let u,..., ur be the nonzero roots of the determinantal equation 
d 

|Wi — u(Wı + W2)| = 0 where W, = XX’, X = [Xr,..., Х,], X; ~ N(0, D, j= 

l,...,r < p, № ~ Wp(n, I), n = p, апа № and № be independently distributed. Let- 

ting цу > цо > ::: > ur > 0, r < p, the joint density of the nonzero roots ш, ..., Ur, 

denoted by ў. (ил, ..., Hr), is given by 


fee DO 
falhi,- -, br) = nO nanc 
«Ш Пао e] Tou- up]. 63.13) 
j=l i<j 


It can readily be observed from (8.3.9) that U = XX’ + W2 and V = U -3X are indeed 
independently distributed. 
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8.3.2. A direct evaluation as an eigenvalue problem 
Consider the singular version of the original determinantal equation 
E zl 
|W, —AW2| = 0 > |W, У, * — AI| 20 (8.3.14) 


where Wi is singular, W2 is nonsingular, and W; and W3 are independently distributed. 
Thus, the roots А ;'ѕ of the equation |W; — AW2| = 0 coincide with the eigenvalues of 
1 1 


the matrix U = Wy туу, Wy ?. Let № be a real nonsingular matrix-variate gamma or 
Wishart distributed matrix. Let W; = XX’, X = [X1,..., X,], Xj 2d №(0, X), X > 
О, j= 1,...,% r < p, or, equivalently, ће р x r, r < p, matrix X is a simple 


random sample from this p-variate real Gaussian population. We will take X = J without 
any loss of generality. In this case, X is a p x r full rank matrix with r < p. Let W2 ~ 
Wp(n, I), n = р, that is, W2 is a nonsingular Wishart matrix with n degrees of freedom 
and parameter matrix 7, and №! > О (positive semi-definite). Then, the joint density of X 
апа W2, denoted by f7(X, W2), is the following: 


e- 90:99) |W5| = P3 (И) 


fi (X, №) = (0 


(2л) #22 Py) 
Consider the exponent 


1 1 zl _1 1 
-3u(XX + №) = -3uLWaü + W, ?XX’'W, *)] = 5000201 + VV’) 


2 : 
where V — W, °X => dX = |W5|2dV for fixed W2. The joint density of W2 and V, 
denoted by fg(V, W2), is then 


|W |815 F e- 3r eV v^) 


fs(V, W2) = (i) 


(21) 2 2? Г(") 
_1 al 

Observe that V V' = W, ^XX'W, ^ = U of (8.3.14). Integrating out W2 in (ii) by using 

a real matrix-variate gamma integral, we obtain the marginal density of V, denoted by 


fo(V), as 
D (E) НР 
PAV = 22 r+ VV" | av (iii) 


PAY 
where УУ” > О (positive semi-definite). Note that 


pr 


p+ VV'| 2 |I, 4- V'V|, VV' > О, УУ > О (positive definite), 
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which follows from Theorem 8.3.3(b). This last result can also be established by expanding 
the following determinant in two ways as was done in (8.3.3) and (8.3.4): 


I, V 
р | m Mo + VIS Url Up + УУЛ 
> I, + V'V| = Hp + VV". 


Hence, the density of V’ must be of the following form where c, is the normalizing con- 
stant: . 
fio(V)dV' = eI + V'V|- C?dV', n> p, ғ <p. (iv) 


On applying Theorem 4.2.3, (iv) can be transformed into a function of Sı = V'V > О, Sı 
being of order r x r. Then, 
pr 


т? p_r+l 


dv’ = raj *d». 
ay) 


Substituting the above expression for dV’ in fi9(V’) and then integrating over the r x r 
matrix Sı > О, we have 


xc DONC _ 


2 
1 
I) ng 


J fioVdV’ = с 
ү’ 


Accordingly, the density of V’ is the following: 


In. T, (= 
Ло(У)аУ' = cla NR 
л? DDC) 


EVV Dav. (8.3.15) 


Note that I,(5) cancels out. Then, by re-expressing dV’ in terms of dS; in (8.3.15), the 
density of S; = V'V is obtained as 


rE) 


p rtl nar 
8 2 H S|? (8.3.16) 
n-cr 
Ep 


fu(S)) = 


for 51 = VV > О, n > p, г < p, and zero elsewhere, which is a real r x r matrix- 
variate type-2 beta density with the parameters (4, a). Thus, the following result: 

Theorem 8.3.8. Let Wi = XX’, X = [X1,..., X,], Xj iS Nil Oy I), gy—145 
r< p, № ~ Wy(n, I), n = p, and № апа № be independently distributed. Let 
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_l 
V = W, ^X, VV' > O and S| = V'V > О. Then, |1 - VV'| = |1. + V'V| = |I; + Si]. 
The density of V' is given in (8.3.15) and that of S1, which is specified in (8.3.16), is a real 
matrix-variate type-2 beta density with the parameters (8, n. Moreover, the positive 


at zl 
semi-definite matrix S2 = W, XX’ W, 2 is distributed, almost surely, as S1, which has a 


nonsingular real matrix-variate type-2 beta distribution with the parameters (5, a). 


Observe that this theorem also holds when X; N,(O,2), X > О and 
WP, ~ W,(n, X), X > О, and the distribution of Sz will still be free of X. Con- 
verting (8.3.16) in ae of the eigenvalues of S4, which are also the nonzero eigen- 


values of 5 = W, XX Wy ? of (8.3.14), we have the following density, denoted by 
(Ат Ard D, D = diaga, ...,A,), assuming that the eigenvalues are distinct and 
such that Aa > № >... А > 0: 


Р б Е Jad DG) 7° 
a TORCA ГА) 
x [TEE Пи re wes PIT = лар. (8.3.17) 
j=l j=l i<j 


Theorem 8.3.9. Let № and X be as defined in (8. 3. а Then, the joint density of the 


nonzero eigenvalues hj, ..., А of $9 = Wy ?ХХ' W, ? which are assumed to be distinct 
and such that Жу > +++ >A, > 0, is given in (8.3. 17). 


In (8.3.13) we have obtained the joint density of the nonzero roots м > --- > ur of 
the determinantal equation 


|XX’ — u(XX' + W>)| 20 2 [XX’—AW>| 20, A= a= 


— =й 
> |W, >XX’W, ? —al| = 0. (8.3.18) 


Hence, making the substitution ш; = s in (8.3.13) should yield the density appearing 
in (8.3.17). This will be stated as a theorem. 


Theorem 8.3.10. When ш; = uk 
Aj's, as respectively specified in (8.3.13) and (8.3. 17), coincide. 


or Àj = us the distributions of the 1;’s or the 
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8.3a. The Singular Complex Case 


The matrix manipulations that were utilized in the real case also apply in the complex 
domain. The following result parallels Theorem 8.3.1: 


Theorem 8.3a.1. Let the p x p matrix A = A* > О be a Hermitian positive semi- 
definite matrix of rank т < p, where A* designate the conjugate transpose of A. Then, A 
can be represented as A = A,Aj where A, is p xr, г < p, of rank т, that is, all the 
columns of A, are linearly independent. 


A derivation of the Wishart matrix in the complex case can also be worked out from 
a complex Gaussian distribution. In earlier chapters, we have derived the Wishart density 
as a particular case of the matrix-variate gamma density, whether in the real or complex 
domain. Let the p x 1 complex vectors X jp J=1,...,n, be independently distributed as 
p-variate complex Gaussian random variables with the null vector as their mean value and 
a common Hermitian positive definite covariance matrix, that is, Х j 2 N p(O, X), У = 
X* > О for ј =1,...,n.Letthe p xn matrix X = [X], — X] be the simple random 
sample matrix from this complex Gaussian population. Then, the density of X, denoted by 
f (X), is given by 

е- Ej- GENK; gtr EK 


FOO) = ләде Жу" ^ aedem зал) 


for n > p. Let the p x p Hermitian positive definite matrix XX* = №. Боги > p, it 

follows from Theorem 4.2a.3 that 

np 

dX = = 
р n 


Idet(W)|"-"qW (8.3a.2) 


where dX — ау ^ аў», Х = Yi -iYo, i = /—1, Yi, Pe being real p x n matrices. 
Given (8.3a.1) and (8.3a.2), the density of W, dme by f (W), 1s obtained as 


"M" det W n—p —t(Z-lW) В Е 
а | E "TUE ети d 
p 


and zero elsewhere; this is the complex Wishart density with n degrees of freedom and 
parameter matrix X > O, which is written as W ~ W, (n, X), X > О, n> p. lf 


X; ~ №, X), X > О, then letting X — = 1(Xı + e X, X 2 (X,..., X) and 


W-X-XX&-Xy, (8.3.4) 
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we have W ~ W (п –1, 5), п—1> р, & > О, ог W has a Wishart distribution with 
n — 1 instead of n degrees of freedom, р + О being eliminated by subtracting the sample 
mean. The remainder of this section is devoted to the distribution of XX* for n < p, that 
is, in the singular case. Proceeding as in the real case, we have 


det(Ij — XX*) = det, — X*X) (8.3a.5) 


where XX* > О is p x p, whereas the r x r, r < p, matrix X*X > О. Then, we have 
the following result: 


Theorem 8.3a.2. Let T, — XX* and T; — X*X. Then, the eigenvalues of Т» are all real 
and positive and the nonzero eigenvalues of T are identical to those of T5, the remaining 
ones being equal to zero. 


The complex counterpart of Theorem 8.3.4 that follows can be derived using steps 
parallel to those utilized in the real case. 


Theorem 8.3a.3. Let the p x 1 complex vectors X; S Np (à, У), 5 > О, j= 


1,...,r. Let Š = [X], ..., X,] be the simple random sample matrix from this complex 
p-variate Gaussian population. Let Ty = X*E-!X or T; = X*X if X = I». Then, the 
density of T», denoted by f,(T»), is the following: 


FU 1 x ~ ~ 
RO» = T |det(75)|? "e 2. Th> О, r € p, (8.3a.6) 


so that T» ~ W,( р, Г), that is, T» has a complex Wishart distribution with p degrees of 
freedom. If jt # О, let = (X—X)*(X—X) or Tr = (X-Xy x (X-X) when У #1 
with X = (Х, ЖОР X) wherein X = IX . +8. Then, Т ~ W,(p—1, Г, r < р-1. 


8.3a.1. Singular gamma or singular Gaussian distribution, complex case 


Let X = [X,,..., X,] where the p X 1 vectors X,,..., X, are independently dis- 
tributed as complex Gaussian vectors whose mean value is the null vector and covariance 
matrix is 7, that is, X; г: N,(O, Г), ] = L...,r, r < p. Then, the density of X, 
denoted by f;(X), is 

e-u(XX*) 
Ao = NET E (8.3a.7) 
Let r < p so that XX* > O (Hermitian positive semi-definite). Let the p x p Hermitian 
positive definite matrix Wz have a Wishart density with n > p degrees of freedom and 
parameter matrix 7, that is, Wr ~ W, (n, I), n > p. Further assume that X and W> are 
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independently distributed. Then, the joint density of X and Wo, denoted by f (X, W>), 15 
the following: 


AG, ty = ae 


= ‚п> р, г <р, 8.3а.8) 
swf) р р ( 


where x is p xr, г < p. Letting the p x p matrix U = XX*+ W; > О, the joint density 
of U and X, denoted by /3(Х, U), is given by 


е0) jdet(Ŭ — ХХ)" 


A(X, U) = - 
fat ) лп F,(n) 


, N> р, Үү <р. 


Letting U = U* > О, one has 


lde«(Ü — XX*)| = деб) deti — 0-2 X" 73) 
У; Lx 


= |det(U)| idet — VV*)|, V = U72X, 


where V is a pxr matrix of rank r < p. Since dX = |det(U)|"dV for fixed U, the joint 
density of U and V is as follows: 


|det(U) | tr-Pe-t() 


RU, V) = = det(/j — V VSP. 8.3a.9 
fa( ) Fo) Idet(I, )| (8.3a.9) 


As has been previously noted, 
|det(/ — VV*)| = |det(Z, — V*V)| 


where V*V is an r x r Hermitian positive definite matrix. Since fa(U ; V) can be fac- 
torized, U and V are independently distributed, and the marginal density of V, denoted 
by fs(V), is of the following form, after integrating ош Ü with the help of a complex 
matrix-variate gamma integral: 


fs(V*)dV = |де — VV*)|'7?qV = č |е — УУ)" ғау" (8.3a.10) 


where С is the normalizing constant. Now, proceeding as in the real case, the following 
result is obtained: 


Theorem 8.3a.4. Let the p x 1 complex vectors Xj M (0, Г), j=1,...,r, and 
X = [X1,..., X;] be the p x r full rank sample matrix with r < p. Let W be a p x p 
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Hermitian positive definite matrix having a nonsingular Wishart density with n degrees of 
freedom and parameter matrix I, that is, Wr ~ W, (n, I), n > p, and assume that Wo 
and X are independently distributed. Let U = XX* + W be Hermitian positive definite, 
V=U -3X and § = V*V. Then, U and V are independently distributed and the densities 
of V and S, respectively denoted by fs(V) and fo(S), are 


ЭКУ Ll) Fn 4 r) CIN ө рсы 
ў) = = = det(J — V*V)|" P, 1 8.3a.11 
fs(V) Ls DOMOS et( )| r<p (8.3a.11) 


and 


оё T. + ғ) ЗЕР Зе 
5) = = E det(S)|? "|det(/ — S)|" P, n > p, , (8.3a.12) 
fo(S) TEN NOU a Idet — S)", nz p, r « p, (83a 


observing that n — р = (n +r — p) =r. 


Let W; = XX* where the p x r matrix X is the previously defined sample matrix 
arising from a standard complex Gaussian population. Let Wy ~ W, (n, I) and assume 
that Wi and W> are independently distributed. Letting 0 = ХХ* + W> > O, consider the 
determinantal equation 


det(W; — Ш(Й + W2)) = 0. (i) 


Then, as in the real case, the following result can be obtained: 
Theorem 8.3a.5. Let Wi, И, V and О be as previously defined. Then, 


det(W, — и (ЙЛ + W2)) 20 > det(V*V — uI,) = 0. (ii) 


This establishes that the roots u ;’s of the determinantal equation (i) coincide with the 
eigenvalues of V*V, and since V*V > O, the eigenvalues are real and positive. Let the 
eigenvalues be distinct, in which case ш > ··· > џи; > 0. Then, steps parallel to those 
utilized in the real case will yield the following result: 


Theorem 8.3a.6. Let (41, ..., и, be the nonzero roots of the equation 
det(W; — ШОЙ + W2)) 20 


where Wi and W> are as previously defined. Let uq > +- > ur > 0, r < p, and let 
D = diag(pi, ..., у). Then, the joint density of the eigenvalues ці, ..., Ur, denoted by 
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fa (1, ..., Hr), which is available from (8.3a.12) and the relationship between dS and 
dD, is the following: 


me) Г.(п +r) 


E — ,)dD = = == з == 
ae S 


j=l j=l i<j 
8.3a.2. A direct method of evaluation in the complex case 


The steps of the derivations being analogous to those utilized in the real case, the 
corresponding theorems will simply be stated for the complex case. Let W; = ХХ“ be 
a p x p singular Wishart matrix where the p x r matrix X = [X], kiss XI X; m" 
Np(O, Г), j =1,...,r, andr < p. That is, Ха simple random sample matrix from 
this complex Gaussian population. Let № > О have a complex Wishart distribution with 
n degrees of freedom and parameter matrix 7, that is, W; ~ W, (n, I), n 2 p, and assume 


that W, апа W» be independently distributed. Consider the initial equation 


" = Жу сз eee ee 
det(W| — АИ) = 0 = det(W, ^W1W, * — AI) = 0, (8.3a.14) 


whose roots are А, ..., А, 0, ..., 0, and the additional equation det(W4 —u(Wi + W2)) = 
0, whose roots will be denoted by ші, ..., Hr, 0,..., 0. In the following theorems, the À ;’s 
and u ;’s will refer to these two sets of roots. 
Theorem 8.3a.7. Let ў = ХХ", X=[X|,..., X], X; “ (0. D, j =1,...,г, 
andr « p. Let W> ~ W (п, Г), n > p, be a nonsingular complex Wishart matrix 
with n degrees of freedom and parameter matrix I. Further assume that Wy and № are 
M "E ae "n" " "e 

independently distributed. Let V = W, ^X, VV* > O, V*V > О, and $4 = V*V > О. 
Then, деї(Т + VV*) = det(1, + V*V) = det(1, + $j), and the densities of V* and $1 
are respectively given by 

Г,(р) Fr) 


fig (V*)aV* = — det(I + V*V)|-"*PaY* — (83a.15) 
n moD Aiar p 


and 


Fan + ғ) 
Fp) б.т +r = р) 


fao = |det($1)|?~" |det(Z + SDT”, (8.3a.16) 
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which is a complex matrix-variate type-2 beta distribution with the paramen (р, n— 


p +r). Additionally, the positive semi-definite matrix $ = w, XK A is distributed, 


almost surely, as Sı, which has a nonsingular complex matrix-variate type-2 beta distri- 
bution with the parameters (p, n— p +r). 


Theorem 8.3a.8. Let W2 and X be as defined in Theorem 8. 3a. 7. Then, the joint density 


of the nonzero eigenvalues А, ..., A, of 5 = w, АЕ , which are assumed to Бе 
distinct and such that X > +--+ > hy > 0, is given * 


Г,.(п + ғ) f=") 
Fo ia = p+r) Eg) 


«Ш JL e "Пела, 


i<j 


Tortent d Des 


(8.3a.17) 
where D = diag(A,,...,A,). 


Theorem 8.3a.9. When ш; = v or Àj = te , the distributions of the u;’s and i ;’s, 
as respectively defined in (8.3a.13) and (& 3a. 17), А 


8.4. The Case of One Wishart ог Gamma Matrix in the Real Domain 


If we only consider a single p x p gamma matrix W with parameters (a, B), B > 
О, Ræ) > E whose density is 


| B|* 


W 
f(W) = T, (a) 


р—1 
пиа Fe ТВ) ү б B>O, (а) > = (8.4.1) 


and zero elsewhere, then it can readily be determined that Z = В 2 W has the density 


1 


fi(Z) = Te) 


=| 
Ize- Fet, Z> 0, Жо) > a (8.4.2) 


and zero elsewhere. When œ = > апа B = il , Z has a Wishart density with m > p 
degrees of freedom and parameter matrix /, its density being given by 


1 m 
RO = — MZ e Pon (84.3) 
2 [D 
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for m > p, and zero elsewhere. Since Z is symmetric, there exists an orthonormal matrix 
Р, PP' = I, P'P = I, such that Z = PDP’ with D = diag(1, ..., Ap) where 
Aj > 0, j = 1,..., p, are the (assumed distinct) positive eigenvalues of Z, Z being 
positive definite. Consider the equation ZQ; = 4;Qj; where the p x 1 vector Q; is an 
eigenvector corresponding to the eigenvalue Àj. Since the eigenvalues are distinct, the 
eigenvectors are orthogonal to each other. Let Q ; be the normalized eigenvector, Q'; Qj = 
1, j=1,...,p, 0:0; —0,forall i # j. Letting О = [01,..., 0], this p x p matrix 
Q is such that 


ZQ = QD = Z = QDQ' => Q =P, P -[P,...,P,] 


where Pj,..., Pp are the columns of the p x p matrix P. Yet, P need not be unique as 
Pj P;=1 > (—P;)'(—P)) = 1. In order to make it unique, let us require that the first 
nonzero element of each of the vectors P},..., Pp be positive. Considering the transfor- 
mation Z = PDP’, it follows from Theorem 8.2.1 that, before integrating over the full 
orthogonal group, 

dz | o- rj) |ар h(P) (8.4.4) 

i<j 

where А(Р) is a differential element associated with the unique matrix of eigenvectors Р, 
as is explained in Mathai (1997). The integral over h(P) gives 


р 


| МИЁ ш (8.4.5) 
о, 70) di 


where O, is the full orthogonal group of p x p orthonormal matrices. On observing that 
|Z| = = Aj; and tr(Z) = А +--++Ap, it follows from (8.4.3) and (8.4.4) that the joint 
density of the eigenvalues A, ..., Ар and the matrix of eigenvectors P can be expressed 
as 


m_ptl _lyP ; 
Mj- А12 е 2 24 TT, 0s — АЈ] 
22009) 


f3(D, P)dD лар = dDh(P). (8.4.6) 


Thus, the marginal density of А1, ..., Ар can be obtained by integrating out P. Denoting 
this marginal density by f4(41, ..., Ар), we have 
p orp тр -LSP у, 
л Miaa е2 
Абл, А) = To —— Т] хијар, (8.4.7) 
(5) 23 D$) i<j 
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and zero elsewhere. The density of P is then the remainder of the joint density. Denoting 
it by f5(P), we have the following: 


Г,(5) / 
fs(P)dP = ;-h(P), РР — I, (8.4.8) 
T5 
where P = [P,..., Pp], the first nonzero element of P; being positive for j = 1,..., p, 


so as to make P unique. Hence, the following result: 


Theorem 8.4.1. Let the p x p real positive definite matrix Z have a Wishart density 
with m > p degrees of freedom and parameter matrix I. Let Ал, ..., Ар be the distinct 
positive eigenvalues of Z in decreasing order and the p x p orthonormal matrix P be the 
matrix of normalized eigenvectors corresponding to the X;'s. Then, (1, ..., àp} and P 
are independently distributed, with the densities of (1, ..., Ap} and P being respectively 
given in (8.4.7) and (8.4.6). 


8.4a. The Case of One Wishart or Gamma Matrix, Complex Domain 


Let W be a p x p complex gamma distributed matrix, W = W* > O, whose density 
is 
- det(B) |“ : "m > 
f(W) = Poe leti PeP, W > О, В> О, R@)>p-—1. (8.4a.1) 
pia 


Letting Z = B? W, Z has the density 


АЛОХ) = = : Idet(Z)|*-?e-* D. X > 0, (o) > p — I. (8.4a.2) 
Гр(а) 


Ifa = m, m = p, p + Ll... in (8.4a.2), then we have ће following Wishart density 
having т degrees of freedom in the complex domain: 


TN 1 = 7: 

fo(Z) = = Idet(Z)|" 2e (2), (8.4a.3) 
Tpm) 

Consider a unique unitary matrix P, PP* = I, P*P = I such that P*ZP = 

diag(41, ..., Ap) where A1, ..., Ар are the eigenvalues of Z, which are real and positive 

since Z is Hermitian positive definite. Letting the eigenvalues be distinct and such that 

A, > А >... > Ар > 0, observe that 


aZ = | П-лар) (8.4a.4) 


i<j 
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where Ё (Р) is the differential element corresponding to the unique unitary matrix P. Then, 
as established in Mathai (1997), 


FT PPD 
Í h(P) = — (8.4a.5) 
Op I (p) 
where Ó p is the full unitary group. Thus, the joint density of the eigenvalues 41, ..., Ap 


and their associated normalized eigenvectors, denoted by ГА (D, P), is 


р 
-pg- EJ 


AW, Буар nab =! [TT = rj? Jap (Р).  (84a.6) 


р лт 
125] 
ГЪ i<j 


p 
(m) 
Then, integrating out P with the help of (8.4a.5), the marginal density of the eigenvalues 
A, +-> Ар > 0, denoted by f4(41, ..., Ар), is the following: 


faa, ..., Ap) dD = — 
(m) 


Dy) [ч А2 |а. (8.447) 
р 


лр у АЛ" Pe X 
Гь i<j 

Thus, the joint density of the normalized eigenvectors forming P, denoted by fs(P), is 

given by 


——— 
fs(P) dP = — AO). (8.42.8) 


These results are summarized in the following theorem. 


Theorem 8.4a.1. Let Z have the density appearing in (8.4a.3). Then, the joint density 
of the distinct eigenvalues Ау > + > Ар > Оор Z is as given in (8.4a.7) and the joint 
density of the associated normalized eigenvectors comprising the unitary matrix P is as 
specified in (8.4a.6). 
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Chapter 9 A) 
Principal Component Analysis fs 


9.1. Introduction 


We will adopt the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital let- 
ters X, Y, ... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of let- 
ters such as X, y, X , Ү to denote variables in the complex domain. Constant matrices will 
for instance be denoted by A, В, С. A tilde will not be used on constant matrices unless 
the point is to be stressed that the matrix is in the complex domain. The determinant of 
a square matrix A will be denoted by |A| or det(A) and, in the complex case, the abso- 
lute value or modulus of the determinant of A will be denoted as |det(A)|. When matrices 
are square, their order will be taken as p x p, unless specified otherwise. When A is a 
full rank matrix in the complex domain, then AA* is Hermitian positive definite where 
an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will 
indicate the wedge product of all the distinct differentials of the elements of the matrix 
X. Letting the p x q matrix X = (x;;) where the x;;’s are distinct real scalar variables, 
ах = didis ^T dx;;. For the complex matrix Х = Xı +iX2, i = J(—1), where X4 
and X» are real, dX — dX, лах». 


The requisite theory for the study of Principal Component Analysis has already been 
introduced in Chap. 1, namely, the problem of optimizing a real quadratic form that is sub- 
ject to a constraint. We shall formulate the problem with respect to a practical situation 
consisting of selecting the most "relevant" variables in a study. Suppose that a scientist 
would like to devise a “good health" index in terms of certain indicators. After select- 
ing a random sample of individuals belonging to a population that is homogeneous with 
respect to a variety of factors, such as age group, racial background and environmental 
conditions, she managed to secure measurements on p — 15 variables, including for in- 
stance, x1: weight, x2: systolic pressure, x3: blood sugar level, and x4: height. She now 
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faces a quandary as there is an excessive number of variables and some of them may not 
be relevant to her investigation. A methodology is thus required for discarding the unim- 
portant ones. As a result, the number of variables will be reduced and the interpretation 
of the results, possibly facilitated. So, what might be the most pertinent variables in any 
such study? If all the observations of a particular variable x; are concentrated around a 
certain value, uj, then that variable is more or less predetermined. As an example, sup- 
pose that the height of the individuals comprising a study group is neighboring 1.8 meters 
and that, on the other hand, it is observed that the weight measurements are comparatively 
spread out. On account of this, while height is not a particularly consequential variable 
in connection with this study, weight is. Accordingly, we can utilize the criterion: the 
larger the variance of a variable, the more relevant this variable is. Let the p x 1 vector 
X X ж = INA Xp), encompass all the variables on which measurements are avail- 
able. Let the covariance matrix associated with X be 27, that is, Cov(X) = 27. Since 
linear functions also contain individual variables, we may consider linear functions such 
aS u = аху +--+ t apxp = АХ = X'A, a prime designating a transpose, where 


X] ay Cit о... ор 
х= | |А |2 | and E9(p-2| OP UU (9.1.1) 
Xp ар Opl Op2 sa Opp 
Then, 
Var(u) = Var(A'X) = Var(X’A) = A' SA. (9.1.2) 


9.2. Principal Components 


As will be explained further, the central objective in connection with the derivation of 
principal components consists of maximizing A’ X A. Such an exercise would indeed prove 
meaningless unless some constraint is imposed on A, considering that, for an arbitrary 
vector A, the minimum of A'X A occurs at zero and the maximum, at +оо, X = E[X — 
E(X)][X — E(X)] being either positive definite or positive semi-definite. Since 27 is 
symmetric and non-negative definite, its eigenvalues, denoted by A; > A2 > --- = Ap = 0, 
are real. Moreover, X being symmetric, there exists an orthonormal matrix P, PP' — 
I, P'P — I, such that 
м 0... 0 
O à- 0 


P'XP = diag(à1,..., Àp) = Л = (9.2.1) 


0 0 .-. Ар 


Principal Component Analysis 599 


and 

У = PAP LAMP +--+ +ApPyP,, (9.2.2) 
where P1, ..., Pp constitute the columns of P, P; denoting a normalized eigenvector cor- 
responding to A;, i = 1,..., р; this is expounded for instance in Mathai and Haubold 


(2017a). Note that all real symmetric matrices, including those having repeated eigenval- 
ues, can be diagonalized. Since the optimization problem is pointless when A is arbitrary, 
the search for an optimum shall be confined to vectors A such that A’A = 1, that is, vectors 
lying on the unit sphere in NP. Without any loss of generality, the coefficients of the linear 
function can be selected so that the Euclidean norm of the coefficient vector is unity, in 
which case a minimum and a maximum will both exist. Hence, the problem can be restated 
as follows: 

Maximize А' A subject to A'A = 1. (i) 


We will resort to the method of Lagrangian multipliers to optimize А”; А subject to the 
constraint A’A = 1. Let 
$1 = АУА — A(A'A— 1) (ii) 


where A is a Lagrangian multiplier. Differentiating фу with respect to A and equating the 
result to a null vector (vector/matrix derivatives are discussed in Chap. 1, as well as in 
Mathai 1997), we have the following: 


dd] 


стала ыл Oe Me (iii) 


On premultiplying (iii) by A’, we have 
АА = ЛАА. (9.2.3) 


In order to obtain a non-null solution for A in (iii), the coefficient matrix X — AJ has to 
be singular or, equivalently, its determinant has to be zero, that is, |27 — A7| = 0, which 
implies that A is an eigenvalue of X, A being the corresponding eigenvector. Thus, it 
follows from (9.2.3) that the maximum of the quadratic form А’ А, subject to A'A = 1, 
is the largest eigenvalue of X: 


тах [A'ZA] = A, = the largest eigenvalue of X. 
A'A=1 


Similarly, 
min [А ХА] = Ар = the smallest eigenvalue of 27. (9.2.4) 
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Since Var(A'X) = A'XA = A, the largest variance associated with a linear combina- 
tion A'X wherein the vector A is normalized, is equal to 41, the largest eigenvalue of 27, 
and letting A; be the normalized (АТА! = 1) eigenvector corresponding A1, ш = AX 
will be that linear combination of X having the maximum variance. Thus, иј is called 
the first principal component which is the linear function of X having the maximum vari- 
ance. Although normalized, the vector A, is not unique, as (—41)'(— A1) is also equal to 
one. In order to ensure the unicity, we will require that the first nonzero element of А — 
and the other p — | normalized eigenvectors—be positive. Recall that since 27 is a real 
symmetric matrix, the А ;’s are real and so are the corresponding eigenvectors. Consider 
the second largest eigenvalue A» and determine the associated normalized eigenvector A»; 
then uz = A5X will be the second principal component. Since the matrix P that di- 
agonalizes X into the diagonal matrix of its eigenvalues is orthonormal, the normalized 
eigenvectors А, A2,..., Ap are necessarily orthogonal to each other, which means that 
the corresponding principal components uj = А! Х, u2 = A5X,..., Up = А,Х will be 
uncorrelated. Let us see whether uncorrelated normalized eigenvectors could be obtained 
by making use of the above procedure. When constructing A2, we can impose an addi- 
tional condition to the effect that A'X should be uncorrelated with АХ ; АУ A, being 
equal to A1, the largest eigenvalue of X. The covariance between A’X and АХ 15 


Cov(A'X, AX) = А'Соу(Х)А = АХА = А! ХА. (9.2.5) 
Hence, we may require that AZ A; = А] ZA = 0. However, 
0— A' EA; = A (ZA) = АМА = МАА = A'A = 0. (iv) 


Observe that A; > 0, noting that 27 would be a null matrix if its largest eigenvalue were 
equal to zero, in which case no optimization problem would remain to be solved. Consider 


do = A'ZA — 214 (AZ A1 — 0) — ш(А'А — 1) (v) 


where ш and u2 are the Lagrangian multipliers. Now, differentiating фә with respect to A 
and equating the result to a null vector, we have the following: 


д 
E = 0 > 25А 2u(XA1) -2u2A = О. (vi) 


Premultiplying (vi) by A‘, yields 


АХА — ША LA — ШАА = 05 0— WA, — 0 — ш = 0, (vii) 
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which entails that the added condition of uncorrelatedness with uj = A! X is superfluous. 
When determining A j, we could require that A' X be uncorrelated with the principal com- 
ponents иј = Ai xX pene Uj = A^ X; however, as it turns out, these conditions become 
redundant when optimizing А' А subject to A'A = 1. Thus, after determining (ће nor- 
malized eigenvector A ; (whose first nonzero element is positive) corresponding the j-th 
largest eigenvalue of 27, we form the j-th principal component, и; = АХ , which will 
necessarily be uncorrelated with the preceding principal components. The question that 
arises at this juncture is whether all of u1, ..., ир are needed or a subset thereof would 
suffice? For instance, we could interrupt the computations when the variance of the prin- 
cipal component и; = АХ , namely А ;, falls below a predetermined threshold, in which 
case we can regard the remaining principal components, и; +1, ..., ир, aS unimportant 
and omit the associated calculations. In this way, a reduction in the number of variables is 
achieved as the original number of variables p is reduced to j < p principal components. 
However, this reduction in the number of variables could be viewed as a compromise since 
the new variables are linear functions of all the original ones and so, may not be as inter- 
pretable in a real-life situation. Other drawbacks will be considered in the next section. 


Observe that since Var(u;) = Aj, j =1,..., p, the fraction 
À 
v = T = the proportion of the total variation accounted for by иу, (9.2.6) 
j=1 ^j 


and letting r < p, 


К 

Ds ат = the proportion of total variation accounted for Бу иј, ..., ur. (9.2.7) 

jails 

If v4 = 0.7, vy accounts for 70% of the total variation in the original variables or 70% of 
the total variation is due to the first principal component. If r = 3 and v3 = 0.99, then 
the sum of the first three principal components accounts for 99% of the total variation. We 
can also use this percentage of the total variation as a stopping rule for the determination 
of the principal components. For example when v, of (9.2.7) is say, greater than or equal 


to 95%, we may interrupt the determination of the principal components beyond и,. 


Example 9.2.1. Even though reducing the number of variables is a main objective of 
Principal Component Analysis, for illustrative purposes, we will consider a case involving 
three variables, that is, p — 3. Compute the principal components associated with the 
following covariance matrix: 
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Solution 9.2.1. Let us verify that V is positive definite. The leading minors of V = V" 
being 


3 -1 0 
IB) = 3 > 0, 73132-60 =8>0, = 3 1 
0 1 3 


sp EDTA DE DOE 


V > О. Let us compute the eigenvalues of V. Consider the equation |V — AJ| = 0 > 
(3—A)[(B—A)*-1]—B—A) = 0 > (3—2)[G—2)? 2] = 0 2 (3-A)B—-A£V2) = 0. 
Hence the eigenvalues are Ау = 3 + V2, А = 3, Аз = 3 — V2. Let us compute an 
eigenvector corresponding to Ay = 3 + /2. Consider (V — А11 )Х = O, that is, 


Sg. —1 0 ЖТ 0 
-1 3-2A 1 х|={|0|. (ї) 
0 1 3—2A X3 0 


There are three linear equations involving the x;'s in (i). Since the matrix V — AJ is 
singular, we need only consider any two of these three linear equations and solve to obtain 
a solution. Let the first equation be —A/2x, — хә = 0 > хә = —A/2x,, and the second one 
be —x, — 2x5 + хз = 0 > —3x, + 2x1 + хз = 0 > x3 = —x1. Now, one solution is 


1 1 
1 
X, = | —/2 |, which once normalized is A; = 2 —\/2 
=f —1 


Observe that X, also satisfies the third equation in (i) and that 


3 -1 0 1 
1 
A\V Ai = 4IL -2, —1] -1 3 p pe 
0 1 3|| -1 


= TBa? + 3(-V2)* + 3(-1)? — 2(1)(-V2) + 2(- 2) (-1)] 
=з +02 = А. 


Thus, the first principal component, denoted by u1, is 


1 
uy = A,X = jl — 2x» — xa]. (ii) 
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Now, consider an eigenvector corresponding to Az = 3. (V — A31) X = О provides the 
first equation: —x2 = 0 = x» = 0, the third equation giving —x; + x3 = 0 > x1 = xs. 
Therefore, one solution for X is 


1 1 
X2 = | 0 | whose normalized form is А5 = — |0 |, 
1 № |1 


so that the second principal component is 


1 
u2 = pr + x3]. (iii) 


| 


It can readily be checked that АУ А; = 3 = 2». Let us now obtain an eigenvector 
corresponding to the eigenvalue Аз = 3— 4/2. Consider the linear system (V — A3) X = О. 
The first equation is /2x1 — x2 = 0 > x» = J2x,, and the third one gives x2 + „223 = 
0 = х = oe 235. One solution is 


1 1 
1 
Хз = | J2 |, its normalized form being Аз = 2 /2 
—1 —1 


The third principal component is then 
из = ste + „2х2 — xa]. (iv) 
It is easily verified that A; V A3 = Аз = 3 — V2. As well, 
Маг(и |) = Tvar) + 2 Var(x2) + Var(x3) — 2/2 Cov(x1, x2) 
— 2Соу(ху, хз) + 2V2 Cov(x2, хз)] 


= 3+2х3+3+2—1) - 20) + 23 


1 
= 13 x 4t 42] 2 3E 2A 


Similar calculations will confirm that Var(u?) = 3 = A» and Var(u3) = 2 — V2 = эз. 
Now, consider the covariance between и and иэ»: 


a 
v2 


— Cov(x3, x1) — Var(xa)] 


Соу(и1, u2) = [Var(x1) + Cov(xi, хз) — V2 Соу(х\, X2) — V2 Cov(xa, X3) 


1 
= 713 +0+ V2- v2-0-0-3]=0. 
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It can be likewise verified that Cov(u;, из) = 0 and Cov(u2, из) = 0. Note that u1, u2, 
and из respectively account for 49.1%, 33.3% and 17.6% of the total variation. As none of 
these proportions is negligibly small, all three principal components are deemed relevant 
and, in this instance, it is not indicated to reduce the number of the original variables. 
Although we still end up with as many variables, the и г”, j = 1, 2, 3, are uncorrelated, 
which was not the case for the original variables. 


9.3. Issues to Be Mindful of when Constructing Principal Components 


Since variances and covariances are expressed in units of measurement, principal com- 
ponents also depend upon the scale on which measurements on the individual variables are 
made. If we change the units of measurement, the principal components will differ. Sup- 
pose that x; is multiplied by a real scalar constant d;, j = 1,..., p, where some of the 
d;'s are not equal to 1. This is equivalent to changing the units of measurement of some 
of the variables. Let 


d 0 «aa 0 
X1 dX Á d г 0 
y= -Fe =D Nep ul (9.3.1) 
Xp dpXp 0 0 Е 4, 
and consider the linear functions А”Х and A'Y. Then, 
Var(A'X) = A'XA, Var(A'Y) = A'Var(DX)A = A’DX DA. (9.3.2) 


Since the eigenvalues of X and D XD differ, so will the corresponding principal com- 
ponents, and if the original variables are measured in various units of measurements, it 
would be advisable to attempt to standardize them. Letting R denote the correlation ma- 
trix which is scale-invariant, observe that the covariance matrix X = XR X, where 
Xı = diag(o1,...,05), о? being the variance of ху, j = 1,..., p. Thus, if the orig- 
inal variables are scaled by the inverses of their associated standard deviations, that 
is, e or equivalently, via the transformation 2, ! X, the resulting covariance matrix is 


Cov(X, 'X) = R, the correlation matrix. Accordingly, constructing the principal compo- 
nents by making use of R instead of X, will mitigate the issue stemming from the scale of 
measurement. 


If 41, ..., Ap are the eigenvalues of 27, then A eub A. will be the eigenvalues of yF. 
Moreover, À and А“ will share the same eigenvectors. Note that the collection (AR. NM AE) 
will be well separated compared to the set (41, ..., Ap) when the 2;'s are distinct and 


greater than one. Hence, in some instances, it might be preferable to construct principal 
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components by making use of X for an appropriate value of k instead of X. Observe that, 
in certain situations, it is not feasible to provide a physical interpretation of the principal 
components which are a linear function of the original x1, ..., хр. Nonetheless, they can 
at times be informative by pointing to the average of certain variables (for example, u2 
in the previous numerical example) or by eliciting contrasts between two sets of variables 
(for example, из in the previous numerical example, which opposes хз to x, and x2). 


To illustrate how the eigenvalues of a correlation matrix are determined, we revisit Ex- 
ample 9.2.1 wherein the variances of the x;'s or the diagonal elements of the covariance 
matrix are all equal to 3. Thus, in this case, the correlation matrix is R = iV, and the 
eigenvalues of R will be 1 times the eigenvalues of У. However, the normalized eigen- 
vectors will remain identical to those of V, and therefore the principal components will 
not change. We now consider an example wherein the diagonal elements of the covariance 
matrix are different. 


Example 9.3.1. Let X’ = (x1, x2, хз) where the x;'s are real scalar random variables. 
Compute the principal components resulting from R, the correlation matrix of X, where 


1 
1 0 E: 
к= | 0 1 = е 
B. id. 1 
v6 м 


Solution 9.3.1. Let us compute the eigenvalues of R. Consider the equation 


к-эл=оә 0 -afa -»* - 2] - zl- ea -2)] =0 


> a-afa-a?- 7] zi 


Thus, the eigenvalues are А = 1 + A о = 1, А = 1— 4^ Let us compute an 


eigenvector corresponding to A; = 1+ A Consider the system (А — A17) X = О. The 


first equation is 


1 1 
——=x, + —=x3 = 0) > x1 = ——X3, 
V3 V6 J/2 
the second equation being 
1 1 0 
——=х2— 3 = 0 > х0 = — 43 
/з м /2 
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Let us take хз = 4/2. Then, an eigenvector corresponding to A, is 


1 1 
Хү = |—1 |, and its normalized form is A; = —1 
/2. /2 


Let us verify that the third equation is also satisfied by Х| or Ау. This is the case since 


KZ + K — v = 0, and the first principal component is indeed иј = ibu — х + 2x3]. 
As well, the variance of u1 is equal to A1: 


Уаг(иџ) = маг) + Var(x2) + 2Var(x3) — 2Cov(x1, x2) 


+ 2/2Cov(x1, X3) — 24/2Cov (x2, x3)] 


1 2/2 2/2 1 
| eade e шй ә 


Let us compute an eigenvector corresponding to the eigenvalue A» = 1. Consider the linear 
system (А — A51) X = О. The first equation, NAE = 0, gives x3 = 0 and the second one 


X1 X2 


also yields x3 = 0; as for the third one, et ele 0 = xı = xo. Letting x; = 1, an 


4 


eigenvector is 


1 1 

1 
Хэ = | 1 |, its normalized form being given by A? = — | 1 
0 v2 |0 


Thus, the second principal component is uz = RÀ 1 + хә]. In the case of Аз, we consider 
the system (А — A37) X = О whose first equation is 


1 1 
— x, + —=хз = 0 > x1 = 033, 
“З м v2 
the second one being 
X2 4 X3 о> 1 
TL — = X2 X3. 
V3 v6 J/2 
Letting хз = 4/2, an eigenvector corresponding to Аз is 
zi i 1 
Хз = 1 | , its normalized form being Аз = 2 1 


/2 /2 
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For the matrix [A1, A», A3] to be uniquely determined, we may multiply A3 by — 1 so that 
the first nonzero element is positive. Hence, the third principal component is u3 — six 1— 
x2 — /2x3]. As was shown in the case of ит, it can also be verified that Var(u2) = 1 = 
Ао, Маг(из) = Аз = 1 — 4 Cov(u1, u2) = 0, Cov(u1, из) = 0 and Cov (u2, из) = 0. 


„3° 
We note that 
À 1+1/V3 
: ss /N3 0.595, 
Ay + À2 + Аз 3 
À À 2+1/V3 
ER ee /8 20,859, 
Ay + À2 + À3 3 


Thus, almost 53% of the total variation is accounted for by the first principal component 
and nearly 86% of the total variation is due to the first two principal components. 


9.4. The Vector of Principal Components 


Observe that the determinant of a matrix is the product of its eigenvalues and that its 
trace is the sum of its eigenvalues, that is, |X| = A1--- Ap and tr(Z) = Ay +--+ + Àp- 
As previously pointed out, the determinant of a covariance matrix corresponds to Wilks’ 
concept of generalized variance. Let us consider the vector of principal components. The 
principal components are и; = АХ, with Var(u;) = АХА} = у, j=1,..., p, and 
Cov(u;, uj) = 0 for alli Æ j. Thus, 


; AL 10 + ЧӨ 
uj AX 0 d. m 0 

D eS S Е > Cov(U)= A4=ļ|. . PATER (9.4.1) 
/ . $ . 2 
Ир АХ O' aiy 


The determinant of the covariance matrix is the product A; · · : àp and its trace, the sum 
A, +++ + Ap. Hence the following result: 


Theorem 9.4.1. Let X be a p x 1 real vector whose associated covariance matrix is X. 
Let the principal components of X be denoted by и; = АХ with Var(u ;) = АУА} = 
Aj = j-th largest eigenvalue of X, and U' = (и\,..., up) with Cov(U) = Xu. Then, 
(Xul = |X| = product of the eigenvalues, А1 --- àp, апа tr(E,) = (27) = sum of the 
eigenvalues, А + +++ + Ар. Observe that the determinant as well as the eigenvalues and 
the trace are invariant with respect to orthonormal transformations or rotations of the 
coordinate axes. 
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Let A = [A], A2,..., Ap]. Then 
УА = АЛ, АА = І, A'ZA- А. (9.4.2) 


Note that U’ = (и,...,и p) Where и|,..., up are linearly independent. We have not 
assumed any distribution for the p x 1 vector X so far. If the p x 1 vector X has a p- 
variate nonsingular Gaussian distribution, that is, X ~ N,(O, X), X > О, then 


uj ^ Ni(0, Aj), U ~ №, (О, A). (9.4.3) 


Since Principal Components involve variances or covariances, which are free of any loca- 
tion parameter, we may take the mean value vector to be a null vector without any loss of 
generality. Then, E[u;] = 0 by assumption, Var(u;) = АУА; А j=1,..., pand 
: . a 
Cov(U) = A = diag(A1, ..., Ар). Accordingly, АЕ 
variance accounted for by the largest eigenvalue Ау, where A, is equal to the variance of 
е ON Ape. + . : 
the first principal component. Similarly, Hu o EL the proportion of the total variance 
due to the first r principal components, r < p. 


is the proportion of the total 


Example 9.4.1. Let X ~ N3(u, X), X > О, where 


X| 1 2 0 1 
Х= |х|, w= 01, 5= |0 2 —1 
X3 —1 1-1 3 


Derive the densities of the principal components of X. 


Solution 9.4.1. Let us determine the eigenvalues of 27. Consider the equation 


2-A 0 1 
|l2—A41|202,|,0 2-A —1|=0 
1 epo Sx 
= (2-—A)[2—A)3B—-—A)-1]-@-A) = 0 
=> (2 А)[А2 – 53 +4] =05 А =4, о =2, Аз =1. 
Let us compute an eigenvector corresponding to A; = 4. Consider the linear system (X — 


A11) X = О, whose first equation gives —2x; + хз = О or xı = 5X3, the second equation, 
—2x2 — x3 = 0, yielding x? = — 5X3. Letting x3 = 2, one solution is 


Хү = | —1 |, and its normalized form is Aj = — | —1 


V6 
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Thus, the first principal component is u; = AX = Alu — xo + 2x3] with E[u1] = 
1 etes Р 
о Jg and 
1 
Маг(и |) = et vana) + Var(x2) + 4Var(x3) — 2Cov (x1, x2) 
+ 4Cov(x1, хз) — 4Cov(x2, x3)] 


= Ы2+2+4х3—2@ +40) - 4—11 =4 = М. 


Since и is a linear function of normal variables, иј has the following Gaussian distribu- 


tion: i 
it м(- == 4). 
V6 
Let us compute an eigenvector corresponding to A? = 2. Consider the system (X — 
A2I)X = О, whose first and second equations give хз = 0, the third equation 


xı — x2 + хз = 0 yielding x; = x». Hence, one solution is 


1 1 
1 
Хә = | 1 |, its normalized form being А5 = — |1 |, 
5 JB ^ 
and the second principal component is из = АХ = Ju + x2] with E[u2] = ll + 


0] = 7 Let us verify that the variance of u2 is A» = 2: 
1 1 
Var(u2) = 5 Varta) + Var(x2) + 2Cov(x1, хә)] = 212 +2 + 2(0)] = 2 = А. 


Hence, из has the following real univariate normal distribution: 


1 
ur ~ № (-z ; 2). 
J/2 
We finally construct an eigenvector associated with Аз = 1. Consider the linear system 
(X — A31I)X = О. In this case, the first equation is ху + хз = 0 > x, = —x3 and the 


second one is x? — хз = О or x2 = хз. Let x3 = 1. One eigenvector is 


=j —1 
1 
Хз = 1 | , its normalized form being A3 = — 


1 V3 
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In order to ensure the uniqueness of the matrix [A1, A2, A3], we may multiply Аз by —1 
to ensure that the first nonzero element is positive. Then, the third principal component is 
из = ln — x2 — xa] with E[u3] = Au —0+1]= E The variance of из is indeed 


equal to A3 as 
1 
Var(u3) = ql var) + Var(x2) + Var(x3) — 2Cov (x1, x2) 
— 2Cov(x1, хз) + 2Cov(x2, x3)] 
1 
= 312 +2 +3 – 2(0) – 201) + 2(—1)] = 1. 


Thus, 


2 
ua ~ № (-z : 1). 
J/3 
It can easily be verified that Cov (u1, u2) = 0, Соу(иџ, из) = 0 and Cov(u2, из) = 0. 
Accordingly, letting 


uj АХ 
О = | иә | = АХ ; 
из A,X 
Ai XA] A ZA? A Z Аз A, 0 0 
Cov(U) = A XA] АУ Аз А У Аз =|0 250 
АУА A,X A? А» X Аз 0 0 „з 
As well, ZA; —AjAj, j = 1,2, 3, that is, 
EE EN des adu. Qd 
(9 SI Iv cce 
0 2 —1 = ow wall uw uw um 02 0 
1-1 3 225 Uy мъ 2те Lo цу 2-4 00 1 
V6 V3 V6 V3 


This completes the computations. 
9.4.1. Principal components viewed from differing perspectives 


Let X be a p x 1 real vector whose covariance matrix is Cov(X) = X > O. Assume 
that E(X) = О. Then, Y X^!X = с > Qis commonly referred to as an ellipsoid of 
concentration, centered at the origin of the coordinate system, with X'X being the square 
of the distance between the origin and a point X on the surface of this ellipsoid. A prin- 
cipal axis of this ellipsoid is defined when this squared distance has a stationary point. 
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The stationary points are determined by optimizing X’X subject to X’ Z^! X = c > 0. 
Consider 


_y/ / v^—1 ðw — -ly _ 
шШш= ХХ- АХ У Ls ag ОА X=O 


1 
= XX = =X => EX = АХ (i) 
where A is a Lagrangian multiplier. It is seen from (i) that A is an eigenvalue of X and X 
is the corresponding eigenvector. Letting Лу > Л > --- > Ap > 0 be the eigenvalues and 


A1, ..., Ap be the corresponding eigenvectors, A1, ..., Ap give the principal axes of this 
ellipsoid of concentration. It follows from (i) that 


1 
c= AL DTA; = А Ар = АуАу = Аус. 
j 
Thus, the length of the j-th principal axis is 2 A’. Aj = 2,/5.] с. 


As another approach, consider a plane passing through the origin. The equation of this 
plane will be 8' X = 0 where f is a p x 1 constant vector and X is the p x 1 vector of the 
coordinate system. Without any loss of generality, let 8/8 = 1. Let the p x 1 vector Y be 
a point in the Euclidean space. The distance between this point and the plane is then В'У. 
Letting Y be a random point such that E(Y) = О and Cov(Y) = X > О, the expected 
squared distance from this point to the plane is E[B/Y]? = E[B'YY'8] = 8'Е(ҮҮ”)В = 
B' XB. Accordingly, the two-dimensional planar manifold of closest fit to the point Y is 
that plane whose coefficient vector В is such that В' XB is minimized subject to 8/8 = 1. 
This, once again, leads to the eigenvalue problem encountered in principal component 
analysis. 


9.5. Sample Principal Components 


When X ^ N,(O, X), X > О, the maximum likelihood estimator of X is Š= is 
where n > p is the sample size and S is the sample sum of products matrix, as defined 
for instance in Mathai and Haubold (2017b). In general, whether 27 is nonsingular or 
singular and X is p-variate Gaussian or not, we may take Š as an estimate of X. The 
sample eigenvalues and eigenvectors are then available from X В = kB where В # О 
is a p x 1 eigenvector corresponding to the eigenvalue К of y= 15 . For а non-null 


B, (È — kI) = О > Š —kl is singular or $ — kI | = 0 and k is a solution of 
|Z -kI| 202 (Z -k;I)B;=0, j=1,...,p. (9.5.1) 


We may only consider the normalized eigenvectors В; such that B; HB clopedePÉ 


If the eigenvalues of 27 are distinct, it can be shown that the eigenvalues of Š are also 
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distinct, kı > k2 > --- > kp almost surely. Note that even though В, В; = 1, B; is not 
unique as one also has (—B;)'(—B;) = 1. Thus, we require that the first nonzero element 
of B; be positive to ensure uniqueness. Since Š is areal symmetric matrix, its eigenvalues 
kı, ..., Kp are real, and so are the corresponding normalized eigenvectors Bj, ..., Bp. As 
well, there exist a full set of orthonormal eigenvectors B1, ..., Bp, В, Bj — 1, В, В; = 
0, iz 7, j=1,..., p, such that for В = (B),..., В), 


В'ЎВ = К =diag(ky,...,kp), ХВ = BK and Y =k, B Bi +- --+kpBpB',. (9.52) 
Also, observe that 


is the proportion of the total variation in the data which is 


kite +k, 
kid, 
total variation due to the first r principal components, r < p. 


kı 
kp 
accounted for by the first principal component. Similarly, 


is the proportion of the 


Example 9.5.1. Let X be a3 x 1 vector of real scalar random variables, X’ = [x1, х2, хз], 
with E[X] = џ and covariance matrix Cov(X) = X > О where both и and X are 
unknown. Let the following observation vectors be a simple random sample of size 5 from 
this population: 


—1 1 —1 0 
Ху=|1|, X% = 1|, X32|-1]|, X4=]-1]|], Х5 = |0 
—1 —1 2 


Compute the principal components of X from an estimate of 27 that is based on those 
observations. 


Solution 9.5.1. An estimate of X is X = 15 where n is the sample size апа 5 is the 
sample sum of products matrix. In this case, n = 5. We first compute S. To this end, we 
determine the sample average vector, the sample matrix, the deviation matrix and finally 
S. The sample average is Х = I[X; + X2 + Хз + X4 + X5] and the sample matrix is 
X = [X1, X2, X5, X4, X5]. The matrix of sample means is Х = [Х, Х,Х,Х, X]. The 
deviation matrix is X; = X — X and the sample sum of products matrix $ = X;X', = 
[X — ХИХ — XJ’. Based on the given random sample, these quantities are 


Principal Component Analysis 613 


-— sui E 1 = 0 0 
кол Re] On oe | s [eee eto e se p eme P 
|0 0 T _1 2 0 


X —[Xi, X2, X3, X4, X5] = 


Xa = [X1 — X, X2 — X, Xa - X, Ха X, X5 — X] 


1—1 1 -10 
= |1 —1 —1 0l, 
0 0 -1 -1 2 
1—1 1 -1 0]7[1 -1 1 -1 ol 
$=|1 —1 —1 0|]|[1 1—1 -1 0 
0 0 -1 -1 2||0 0 —1 —1 2 
400 
=|0 4 2 
026 


An estimate of X is X = is = 15 . However, since the eigenvalues of У are those of 
S multiplied by 1 and the normalized eigenvectors of Š and S will then be identical, we 
will work with S. The eigenvalues of S are available from the equation |S — A7| = 0. That 
is, (4—3)[(4—2)(66—2) -4] = 0  (4—2)2—101 +20] = 0 => A, =54- V5, А = 
4, Аз = 5 — 4/5. An eigenvector corresponding to A; = 5 + \/5 can be determined from 
the system (S — A17) X = О wherein first equation gives x, = 0 and the third one yields 
2x2 + (1 — V5)x3 = 0. Taking x3 = 2, x2 = —(1— /5), and it is easily verified that 
these values also satisfy the second equation. Thus, an eigenvector, denoted by Yj, is the 
following: 


0 0 
1 
Yi = | V5 – 1 | with its normalized form being Ay = ————— Sce Bl 
2 10—24/5 2 


so that the first principal component is иј = tea — 1)x2 + 2x3]. Let us verify 


that the variance of иј equals А1: 
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Уаг(и) = SEN E — 1? Var(x3) + 4Var(x3) + 20/5 — 1)Cov(x2, x3)] 

1 20(5 + v5) 

= —————[(6 — 24/5)(4) + 4(6) + 2(45 — 1)(4)] = == 
TESI V 5)(4) + 4(6) + 2(45 — 1)(4)] T 


= 5 + 4/5 = А. 


An eigenvector corresponding to A» = 4 is available from (S — A57) X = О. The first 
equation shows that all variables are arbitrary. The second and third equations yield x3 — 0 
and x2 = 0. Taking x; = 1, an eigenvector corresponding to A» = 4, denoted by Y», is 


1 1 
Ү = | O | which is already normalized so that А2 = | 0 
0 0 


Thus, the second principal component is из = x; and its variance is Var(x;) = 4. An 
eigenvector corresponding to 43 = 5 — 4/5 can be determined from the linear system 
(S — A431)X = О whose first equation yields x; = 0, the third one giving 2x2 + (1 + 
J/5)x3 = 0. Taking x3 = 2, x2 = —(1 + М5), and it is readily verified that these values 
satisfy the second equation as well. Then, an eigenvector, denoted by Y3, is 


0 0 
Үз = | —(/5 + 1) |, its normalized form being Аз = —(М5 + 1) 
2 2 


10 4- 24/5 


In order to select the matrix [А], A», A3] uniquely, we may multiply Аз by —1 so 
that the first nonzero element is positive. Thus, the third principal component is u3 — 
l [(1 + М/5)хә — 2x3] and, as the following calculations corroborate, its variance 


A/ 10-+2./5 


is indeed equal to Аз: 


1 
Var(u3) = ET + /5)?Var(x2) + 4Var(x3) — 4(1 + У5)Соу(хо, хз)] 
1 20 
= ——— — —[4(6 + 25) + 4(6) — 4(1 + V5)(2)] = = 5 – V5 = А3. 
TENA ) +46) — AG )(2)] 54/5 3 


Additionally, 
ài 54 5 


— = Ае 0.52, 
Ay + À2 + Аз 14 
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that is, approximately 52% of the total variance is due to иј or nearly 52% of the total 
variation in the data is accounted for by the first principal component. Also observe that 


STA], Ad, Аз] = [A], Аз, Аз]р, D= diag(5 + М5, 4, 5— 4/5) or 


A’ 
1 
S = [A1, А2, A3]D | A5 
That is, A5 
4 0 0 
5=|0 4 2 
0 2 6 
0 1 0 0 45-1 2 
Jaini (1+5) 5+5 0 0 „10-25 4/1025 
= 10—24/5 1042/5 0 4 0 l 0 0 , 
2 0 2 0 0: 5—/5|]|0 -4⁄2 2 
A/ 10—24/5 A/ 104-2 /5 AÁ0424/5. Ау 10424/5 


which completes the computations. 
9.5.1. Estimation and evaluation of the principal components 


If X ^ N5,(O, X), У > О, then the maximum likelihood estimator of X, denoted 
by y= 15 where и is the sample size апа 5 is ће sample sum of products matrix. 
How can one evaluate the eigenvalues and eigenvectors in order to construct the principal 
components? One method consists of solving the polynomial equation |X —A7| = 0 for the 
population eigenvalues А, ..., Ap, Or р> — kI| = 0 for obtaining the sample eigenvalues 
kj > k2 = +++ = kp. Direct evaluation of k by solving the determinantal equation is not 
difficult when p is small. However, for large values of p, one has to resort to mathematical 
software or some iterative process. We will illustrate such an iterative process for the 
population values A; > Az > ··· > Ар and the corresponding normalized eigenvectors 
A1, ..., Ap (using our notations). Let us assume that the eigenvalues are distinct. Consider 
the following equation for determining the eigenvalues and eigenvectors: 


DA; =AjA;, f= 1,...,k. (9.5.3) 


Take any initial p-vector W, that is not orthogonal to A1, the eigenvector corresponding 
to the largest eigenvalue A). If Wo is orthogonal to Aj, then, in the iterative process, W; 
will not reach Aj. Let Y, = L W, be the normalized Wọ. Consider the equation 


WIW, 
1 
XY; = Wj, that 1, X Yọ = Wi, 57ү = Wo,..., Y; = —— Wj, queas 
(УЛУ) 
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Halt the iteration when W; approximately agrees with W;_1, that is, W; converges to some 
W which will then be 4; A; or Y; converges to Yı. At each stage, compute the quadratic 
form 6; = Y f XY; and make sure that ô; is increasing. Suppose that W; converges to some 
W for certain values of j or as j — oo. At that stage, the equation is XY = W where 


__ 1 à " à __ 7 
Y= ЭЛҮҮ the normalized W. Then, the equation is XY = /W'WY. In other words, 


~V W'W = i, and Y = А, which are the largest eigenvalue of X and the corresponding 
eigenvector А. That is, 


lim /W/W; = Ал 


joo 


1 
| |——w,| = lim Y; =A. 
J> (W; Wj) J> 
^2 


The rate of convergence in (9.5.4) depends upon the ratio М If à2 is close to А, then the 
convergence will be very slow. Hence, the larger the difference between A, and 25, the 
faster the convergence. It is thus indicated to raise 27 to a suitable power т and initiate the 
iteration on this X” so that the difference between the resulting A; and A» be magnified 
accordingly, А, j =1,..., p, being the eigenvalues of X”. If an eigenvalue is equal to 
one, then 27 must first be multiplied by a constant so that all the resulting eigenvalues of 27 
are well separated, which will not affect the eigenvectors. As well, observe that X and X”, 
т = 1,2,..., share the same eigenvectors even though the eigenvalues are à; and XT. 
j =1,..., p, respectively; in other words, the normalized eigenvectors remain the same. 
After obtaining the largest eigenvalue A; and the corresponding normalized eigenvector 
А, consider 


У = У — MAA}, У = МАА" + A2A2A5 Tee + ApApA,, 


where A; is the column eigenvector corresponding to À; so that A ЈА! is a p x p matrix 
for j = 1,2,..., p. Now, carry out the iteration on 27, as was previously done on X. 
This will produce A? and the corresponding normalized eigenvector A». Note that A» is the 
largest eigenvalue of X2. Next, consider 


X3 = У) – À24245 


and continue this process until all the required eigenvectors are obtained. Similarly, for 
small p, the sample eigenvalues kj > k2 >... > kp are available by solving the equation 
I£ — kI| = 0; otherwise, the sample eigenvalues and eigenvectors can be computed by 
applying the iterative process described above with 27 replaced by ЎА j interchanged 
with Ку and В; substituted to the eigenvector Aj, j = 1,..., p. 
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Example 9.5.2. Principal component analysis is especially called for when the number 
of variables involved is large. For the sake of illustration, we will take p = 2. Consider the 
following 2 x 2 covariance matrix 


2 —1 XI 
У = Ё | = Соу(Х), Х = Я 


Construct the principal components associated with this matrix while working with the 
symbolic representations of the eigenvalues A, and A» rather than their actual values. 


Solution 9.5.2. Let us first evaluate the eigenvalues of X in the customary fashion. Con- 
sider 


IX - А 202 (2-2)0—2)—-1202 А2 ЗА +1 = 0 (i) 
1 1 
= м = 53 + V3], о = 513 – У]. 
Let us compute an eigenvector corresponding to the largest eigenvalue Ал. Then, 
И о s X] 
БЕК ТЕ 
(2 — А)х = хә =0 
—x1 + (1 А) = 0. (il) 


Since the system is singular, we need only solve one of these equations. Letting x2 = 1 in 
(ii), x = (1—A}). For illustrative purposes, we will complete the remaining steps with the 
general parameters A; and A» rather than their numerical values. Hence, one eigenvector is 


1—5 1 1 1—^ 
gel | | ICI = 1+ (1-А1)2 5 А; = С = | | |! 
IC | у1+ (1 А1)2 


Thus, the principal components аге the following: 


1 1 
esc 6 <= Ge ны ыу 
IC ll Jl+(1—A\)2 
1 
ил = iC — А№)хі + x2}. 


AT p (1—2) 


Let us verify that Var(u1) = A, and Var(u2) = A». Note that 


ICI = 1+ (1 А) = А-2 +2 = (А-З +) +жм+1= +1, (йй) 
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given the characteristic equation (i). However, 


Var(C' X) = Var((1 — A1)x1 + x2) 
= (1 — Xi? VarGa) + Var(x2) + 2(1 — 2)Соу(х, x2) 
—2( — A1 +1—2(1 — Ay) 
= 232 — 2a, +1 2042 — 34, + 1) +41 — 1 = 444 — 1, (iv) 
given (i), the characteristic (or starting) equation. We now have Var(C’X) = Var((1 — 


А)х + x2) = 4A, — 1 from (iv) and ||C|| = А + 1 from (iii). Hence, we must have 
Var(C' X) = ||C|?41 = [1+ (1 — 41)?]A1 = (1 + DA4 from (iii). Since 


AjQa +1) =A] + = Q1 — ЗА 1) + 4a1 – 1 = 4А 1, 


agrees with (iv), the result is verified for иј. Moreover, on replacing A; by A2, we have 
Var (u2) = ho. 
9.5.2. L1 and L2 norm principal components 


For any p x 1 vector Y, Y' = (у1,..., Ур), Where yj, ..., ур are real quantities, the 
L2 and L1 norms are respectively defined as follows: 


IX = OF 4D: = Y Y 2 |2 = Y'Y (9.5.5) 
P 

IY н = Dil + Dal +--+ lyol = У yl. (9.5.6) 
j=l 


In Sect. 9.5, we set up the principal component analysis on the basis of the sample sum of 
products matrix, assuming a p-variate real population, not necessarily Gaussian, having 
ш as its mean value vector and 27 > О as its covariance matrix. Then, we considered 
Xj, j = 1,...,n, iid vector random variables, that is, a simple random sample of size 
n from this population such that E[X;] = u and Cov(X;) = X > О, j =1,...,n. 
We denoted the sample matrix by X = [X1,..., Xn], the sample average by X= i[X 1+ 
T Xn], the matrix of sample means by X= [X,..., X] and the sample sum of products 
matrix by $ = [X — ХХ — XJ. If the population mean value vector is the null vector, 
that is, u = О, then we can take 5 = XX’. For convenience, we will consider this case. 


For determining the sample principal components, we then maximized 


A'XX'A subject to A'A = 1 
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where A is an arbitrary constant vector that will result in the coefficient vector of the 
principal components. This can also be stated in terms of maximizing the square of an L2 
norm subject to A’A = 1, that is, 


max ||A'X f$. (9.5.7) 
A'A=1 


When carrying out statistical analyses, it turns out that the L2 norm is more sensitive to 
outliers than the L1 norm. If we utilize the L1 norm, the problem corresponding to (9.5.7) 
is 
max ||A‘X'||;. (9.5.8) 
A'A-1 


Observe that when u = О, the sum of products matrix can be expressed as 
p 
BEXX y X XD 
j=1 


where X; is the j-th column of X or j-th sample vector. Then, the initial optimization 
problem can be restated as follows: 


n n 
max |A'X|2 = max A’X ;X'.A= max АС, 9.5.9 
тах ||A’X|[} = тах УВ jXjA= max 25 ea (9.5.9) 
j=l j=l 
and the corresponding L1 norm optimization problem can be formulated as follows: 
n 
тах [|А”Х| = max У ||A'X;ll1. (9.5.10) 
A'A=1 A'A-1 = 


We have obtained exact analytical solutions for the coefficient vector A in (9.5.9); however, 
this is not possible when attempting to optimize (9.5.10) subject to A'A = 1. Thankfully, 
iterative procedures are available in this case. 


Let us consider a generalization of the basic principal component analysis. Let W be a 
pxm, m < p, matrix of full rank m. Then, the general problem in L2 norm optimization 
is the following: 


tr W'SW) = W'X ;|2 9.5.11 
ANE WS = „ух 2 УХА V 


where / is the m x m identity matrix, the corresponding L1 norm optimization problem 
being 


WX j|. 9.5.12 
Tapa jll (9.5.12) 
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In Sect. 9.4.1, we have considered a dual problem of minimization for constructing prin- 
cipal components. The dual problems of minimization corresponding to (9.5.11) and 
(9.5.12) can be stated as follows: 


j X; -WW'X |2 9.5.13 
TARA j jl (9.5.13) 


with respect to the L2 norm, and 


| X; -WW'Xj 9.5.14 
TARA j jh (9.5.14) 


with respect to the L1 norm. The form appearing in (9.5.13) suggests that the construction 
of principal components by making use of the L2 norm is also connected to general model 
building, Factor Analysis and related topics. The general mathematical problem pertaining 
to the basic structure in (9.5.13) is referred to as low-rank matrix factorization. Readers 
interested in such statistical or mathematical problems may refer to the survey article, Shi 
et al. (2017). L1 norm optimization problems are for instance discussed in Kwak (2008) 
and Nie and Huang (2016). 


9.6. Distributional Aspects of Eigenvalues and Eigenvectors 


Let us examine the distributions of the variances of the sample principal components 
and the coefficient vectors in the sample principal components. Let A be a p x p or- 
thonormal matrix whose columns are denoted by Aj, ..., Ap, so that АА j=l, АА j= 
0,i # j, o AA’ = I, A'A = I. Let uj = АХ be the sample principal components 
for j = 1,..., p. Let the p x 1 vector X have a nonsingular normal density with the null 
vector as its mean value vector, that is, X ~ №(0, X), X > О. We can assume that 
the Gaussian distribution has a null mean value vector without any loss of generality since 
we are dealing with variances and covariances, which are free of any location parameter. 
Consider a simple random sample of size n from this normal population. Letting S denote 
the sample sum of products matrix, the maximum likelihood estimate of 27 is v= is | 
Given that S has a Wishart distribution having n — 1 degrees of freedom, the density of b 
denoted by f (X), is given by 


mp 
jye-— se ser = 
рх Fp) 


where 27 is the population covariance matrix and m = n — 1, n being the sample size. 
Let ki,...,kp be the eigenvalues of X; it was shown that kj > k2 >... > kp > 0 аге 
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actually the variances of the sample principal components. Letting Bj, j = 1,..., р, 
denote the coefficient vectors of the sample principal components and В = (B1, ..., Bp), 
it was established that ВВ = diag(k1,..., kp) = D. The first nonzero component of 
В; is required to be positive for j = 1,..., p, so that B be uniquely determined. Then, 
the joint density of B and D is available by transforming Ў in terms of B and D. Let the 
joint density of B and D and the marginal densities of D and B be respectively denoted 
by g(D, B), g1(D) and g2(B). In light of the procedures and results presented in Chap. 8 
or in Mathai (1997), we have the following joint density: 


m. p+! 


І ТТ а] 


ко. BAD NAB = сет; {1 ul 1] 


—Str((B’ ST ‘B)D) n(B)dD (9.6.1) 


where Л(В) is the differential element corresponding to B, which is given in Chap. 8 and 
in Mathai (1997). The marginal densities of D and B are not explicitly available in a 
convenient form for a general 27. In that case, they can only be expressed in terms of 
hypergeometric functions of matrix argument and zonal polynomials. See, for example, 
Mathai et al. (1995) for a discussion of zonal polynomials. However, when X = Z, the 
joint and marginal densities are available in closed forms. They are 


п? : пр п E, 
g(D, В)ар лав = aam Пе] (Пе уе нв вар, 
(9.6.2) 
mp x 
т = BOOP arg) EUR 
JP Der TO Ше ЛЕ П - p]. 
(9.6.3) 


DG ) 


zi 


во(В)ав = — 2-h(B), BB’ = I = В'В, Кү > kg» o ky» 0. (9.64) 
P 


Given the representation of the joint density g(D, B) appearing in (9.6.2), it is readily seen 
that D and B are independently distributed. Similar results are obtainable for X = o?I 
where o? > 0 is a real scalar quantity and / is the identity matrix. 


Example 9.6.1. Show that the function g1(D) given in (9.6.3) is a density for p = 
2, n = 6, т = 5, and derive the density of kı for this special case. 
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Solution 9.6.1. "We have pet = 3. E — 3 = 1. As g1(D) is manifestly nonnegative, it 


suffices to show that the total integral equals 1. When p = 2 and п = 6, the constant part 
of (9.6.3) is the following: 


mp p? 


n? qoo d w 
2* p Г») 2T) 20) 
35 л? 
_ тг@Г@ VR TO) ут 
" E (i) 
VEG VR улут С” 


As for the integral part, we have 


оо kı 
| J kiko (ky — k5)e 9*1 dky A dk, 
к1=0 Jko=0 


оо 


оо kı kı 
= J Ke] J ke ^dko [akı — / ke] J Ide dks |dkı 
k,=0 ky=0 kı=0 k2=0 


то ki 1 
— k2 —3kı | _ "5.—3kh ET —3k 
Jt | 3° + zzl е | 


к2 2k 2 
— Таш! = а = Wen + 3; — e^) ar, (ii) 


= irae + lr 37] - tree? 
= -307 (4674 + 5LF03 77] - GIP 06 7] 


1 2 2 2 
+ z[r (496 5] + zr O6 7] = 5317003 7] + 3; O6 7] 


3 

1 2 1 2 
= 3З 6 7] + 3:06 7] + эз 037] — 3:03 7] 
E GEN MM. " 
alge te [= e 


The product of (i) and (iii) being equal to 1, this establishes that gı (D) is indeed a density 
function for p = 2 and n = 6. On integrating out k2, the resulting integrand appearing in 
(ii) yields the marginal density of kı. Denoting the marginal density of kı by g11(k4), after 
some simplifications, we have 


eu) = 4G5 (E + 2e + (Hh Zest], 0 < ki < со, 
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and zero elsewhere. Similarly, given g1(D), the marginal density of k» is obtained by 
integrating out k; from Кә to oo. This completes the computations. 


9.6.1. The distributions of the largest and smallest eigenvalues 


One can determine the distribution of the j-th largest eigenvalue k; and thereby, the 
distribution of the largest опе, kı, as well as that of the smallest one, kp. Actually, many 
authors have been investigating the problem of deriving the distributions of the eigenvalues 
of a central Wishart matrix and matrix-variate type-1 beta and type-2 beta distributions. 
Some of the test statistics discussed in Chap. 6 can also be treated as eigenvalue problems 
involving real and complex type-1 beta matrices. Let kı > k2 > --- > ky > O be the 
eigenvalues of the real sampling distribution of a Wishart matrix generated from a p- 
variate real Gaussian sample. Then, as given in (9.6.3), the joint density of kı, ..., kp, 
denoted by gı (D), is the following: 

2 


m mp т” 


P m pt 
gi(D)dD = = гуну Гу «gll Pele тн ТТ — јар (9.6.5) 
Г j=l 


i<j 


where т = п — | is the number of degrees of freedom, п being the sample size. More 
generally, we may consider a p x p real matrix-variate gamma distribution having o as 
its shape parameter and a/, a > О, as its scale parameter matrix, Z denoting the identity 
matrix. Noting that the eigenvalues of aS are equal to the eigenvalues of S multiplied 
by the scalar quantity a, we may take а = 1 without any loss of generality. Let g(S)dS 
denote the density of the resulting gamma matrix and g(D) be g(S) expressed in terms 
of Àj > № > +++ > Ар > O, the eigenvalues of S. Then, given the Jacobian of the 
transformation S — D specified in (9.6.7), the joint density of A},..., Ар, denoted by 
g1(D) is obtained as 


2 
р -2H 


т 2 
D)dS = —————— 
g(D) Fear АП 


a (А+ Ы [19 =e pla = = gi(D)dD 

i (9.6.6) 
where dD = dd, A... ^ ал р. The corresponding p x p complex matrix-variate gamma 
distribution will have real eigenvalues, also denoted by A; > --- > Ар > 0, the matrix 
S being Hermitian positive definite in this case. Then, in light of the relationship (9.6a.2) 
between the differential elements of $ and D, the joint density of A1, ..., Ар, denoted by 
g1(D), is given by 
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mP- 


p 
Da5 = | А50 |е TTes – лар = аар 
g(D)dS = = oO П Ше j) #100) 
(9.6a.1) 
where g(D) is g(S) expressed in terms of D, g(S) denoting the density function of a 
matrix-variate gamma random variable whose shape parameter and scale parameter matrix 


are o and Г, respectively. 


As explained for instance in Mathai (1997), when S and S are the р х р gamma 
matrices in the real and complex domains, the integration over the Stiefel manifold yields 


{T]@:-a)}ap (9.6.7) 


е - 
2 


(5) 
(5) i<j 
and = 
MEE 2/52 
d$ — (A; — А;)2 ар, (9.6a.2) 
P) 2 : | 


respectively. When endeavoring to the marginal density of A; for any fixed j, the 
difficulty arises from the factor [| |; — Àj). So, let us first atienipt to simplify this 
factor. 


iai 


9.6.2. Simplification of the factor | |; — 5) 


T 


It is well known that one can write ч 2 (A; — Àj) as a Vandermonde determinant 
which, incidentally, has been utilized in connection vih nonlinear transformations in 
Mathai (1997). That is, 


"WP ET 1 

E QE 
По -у=|% ? = |A| = l(aij)]. (9.6.8) 
i<j р—1 p-2 | | | 

И З 1 


where the (i, j)-th element а;; = МР ^7 for all i and j. We could consider a cofactor 
expansion of the determinant, |A|, consisting of expanding it in terms of the elements and 
their cofactors along any row. In this case, it would be advantageous to do so along the 
i-th row as the cofactors would then be free of A; and the coefficients of the cofactors 
would only be powers of A;. However, we would then lose the symmetry with respect to 
the elements of the cofactors in this instance. Since symmetry is required for the procedure 
to be discussed, we will consider the general expansion of a determinant, that is, 
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= as = 
JA] = $ Data ++ Opty = УРАНА Ap (9.6.9) 
K K 


where К = (kj,...,kp) and (kj,...,kp) is a given permutation of the numbers 
(1,2,..., p). Defining px as the number of transpositions needed to bring (kj,..., kp) 
into the natural order (1,2, ..., p), (—1)?* will correspond to a — sign for the corre- 
sponding term if px is odd, the sign being otherwise positive. For example, for p = 3, the 
possible permutations are (1, 2, 3), (1, 3, 2), (2, 3, 1), 2, 1, 3), (3, 1, 2), (3, 2, 1), so that 
there are 3! = 6 terms. For the sequence (1, 2, 3), kı = 1, k2 = 2 and k3 = 3; for the 
sequence (1,3, 2), kı = 1, k2 = 3 and k3 = 2, and so on, 5 /, representing the sum of 
all such terms multiplied by (—1)°*. For (1, 2, 3), ок = 0 corresponding to a plus sign; 
for (1, 3, 2), рк = 1 corresponding to a minus sign, and so on. Other types of expansions 
of |A| could also be utilized. As it turns out, the general expansion given in (9.6.9) is the 
most convenient one for deriving the marginal densities. 


In the complex case, 


] [@: —4)° = IAP = 1441 = 1441 


i<j 
where 
о nn э АР! 
д М x s LT M e) 
гг | 


/ 
p 


the (i, j)-th element in 010/1 is i us Accordingly, the (i, j)-th element of 0101 + 
er Apa, is XM gor en. Thus, letting В = A'A = (bij), bij = a АСА 
Now, consider the expansion (9.6.9) of | В|, that is, 


Observe that œ; only contains A; and that A'A = aja + · · · + apa’, so that for instance 


| [Qs — 4)? 2181 = A'A] = S01 biu bos, - bp, (ii) 
i<j K 
where K = (Kj,..., ky) and (kı, ..., kp) is a permutation of the sequence (1,2,..., p). 


Note that 
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2p—(1+k1) 


Pik, = а JA Жозе Др 


bok, = л. + д л (2+k2) T NS ДР 

bpk, = у (p+kp) + i dn (р+к) НЕ i| (ptkp) (iii) 
Let us write 

bik, bok: = bpk, = У; МАТ, (iv) 
А Рр 

Then, 

Gate Isla D P» Abe (9.62.3) 

icj  K | fpes 
where the r;'s, j = 1,..., p, are nonnegative integers. We may now express the joint 


density of the eigenvalues in a systematic way. 
9.6.3. The distributions of the eigenvalues 


The joint density of A}, ..., Ар as specified in (9.6.6) and (9.6a.1) can be expressed as 
follows. In the real case, 


2 


P p 
а oP) 2 Ait +À p—k p—kp 
f(DD = —;— —( A, ? Je кен ”( (=1)РКАР 1... А), "Jap 
ГЬ Гь) S157 2, 
p? 
iid "ру QA p) 
= — У (1) (А... Ар”) eTA ар (9.6.10) 
Г,(5)Г) (о) 2 
K 
with 
p+! | 
"uos ped @ 
In the complex case, the joint density is 
f(D)dD ша ye 1)^K Y б^ pn LL AIP Hy e reap 
D, (p Pp (9) “К a 3 Fp T 
PP 1) 


=—.— У (—1)* pa (Am AMP etA (9.64.4) 


jasy 


"EDI; (a); 
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with m; = а — p + rj and rj as defined in (9.6a.3). For convenience, we will use the 
same symbol т; for both the real and complex cases; however, in the real case, m; = 


а — p + p — kj and, in the complex case, m; = а —p-rj. 


The distributions of the largest, smallest and j-th largest eigenvalues were consid- 
ered by many authors. Earlier works mostly dealt with eigenvalue problems associated 
with testing hypotheses on the parameters of one or more real Gaussian populations. In 
such situations, a one-to-one function of the likelihood ratio statistics could be explored in 
terms of the eigenvalues of a real type-1 beta distributed matrix-variate random variable. 
In a series of papers, Pillai constructed the distributions of the seven largest eigenvalues 
in the type-1 beta distribution and produced percentage points as well; the reader may re- 
fer for example to Pillai (1964) and the references therein. In a series of papers including 
Khatri (1964), Khatri addressed the distributions of eigenvalues in the real and complex 
domains. In a series of papers, Krishnaiah and his co-authors dealt with various distribu- 
tional aspects of eigenvalues, see for instance Krishnaiah et al. (1973). Clemm et al. (1973) 
computed upper percentage points of the distribution of the eigenvalues of the Wishart ma- 
trix. James (1964) considered the eigenvalue problem of different types of matrix-variate 
random variables and determined their distributions in terms of functions of matrix ar- 
guments and zonal polynomials. In a series of papers Davis, dealt with the distributions 
of eigenvalues by creating and solving systems of differential equations, see for example 
Davis (1972). Edelman (1991) discussed the distributions and moments of the smallest 
eigenvalue of Wishart type matrices. Johnstone (2001) examined the distribution of the 
largest eigenvalue in Principal Component Analysis. Recently, Chiani (2014) and James 
and Lee (2021) discussed the distributions of the eigenvalues of Wishart matrices. The 
methods employed in these papers lead to representations of the distributions of eigen- 
values in terms of Pfaffians of skew symmetric matrices, incomplete gamma functions, 
multiple integrals, functions of matrix argument and zonal polynomials, ratios of determi- 
nants, solutions of differential equations, and so on. None of those methods yield tractable 
forms for the distribution or density functions of eigenvalues. 


In the next subsections, we provide, in explicit forms, the exact distributions of any of 
the j-th largest eigenvalue of a general real or complex matrix-variate gamma type matrix, 
either as finite sums when a certain quantity is a positive integer or as a product of infinite 
series in the general non-integer case. These include, for instance, the distributions of the 
largest and smallest eigenvalue as well as the joint distributions of several of the largest or 
smallest eigenvalues, and readily apply to the real and complex Wishart distributions. 
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9.6.4. Density of the smallest eigenvalue 2., in the real matrix-variate gamma case 


We will initially examine the situation where т; = a — mE + p — kj is an integer, so 


that mj; in the real matrix-variate gamma case is a positive {еда We will integrate out 
Aj, .... Арт to obtain the marginal density of àp. Since m; is a positive integer, we can 
integrate by parts. For instance, 


2 


оо 
| Medi, е шы ы шш лил 
À 


1—22 
mı mı! 
з-с шы o 
mı — ! 
fot TS 
and integrating A1, ..., Аут gives the following: 
| f "n 2 dore (А+ “+A j- UdA, ^. SA dA ja 
ЛУЈА 
-ш+ 
(mi — 141)! 2u2+1 (m; — m — иә)! 
E i= = = (mi — рі + m2 — m) 
mj—pjd--cmj-i 
' (mi = Hi +: + mji)! түшт рр А; 
2» e» 
О 10А и 


— ш +: тур aja? 
= к (ii) 
Hence, the following result: 


Theorem 9.6.1. When m; = a— pet + p — kj is a positive integer, where m ; is defined 
in (9.6.10), the marginal density of the smallest eigenvalue X, of the p x p real gamma 
distributed matrix with parameters (a, I), denoted by ћу (А), is the following: 


fip dA 
= eg pipip e ? 


mı—uı+m2 mı—=Hi += +m p1 
on o 
= cK e 
(mı = ш)! 2u2tl(m; — wy +m — u2)! 
о = me (mı — Ш + тә — ио) о 
— кй ! 
(mi Hı + + тр-1)! m|—Hu|i- тр 1—Hp— 1+" р e Prd), 


x 
(p = 14-141 (m, = Mit: t Mp-1 — Шр-1)! pol 
(9.6.11) 
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for 0 < Ар < oo, where 


р? 


т? 
=a ee (DS, 
< ТЬ( Гь) 2, 
ф;—10.у) is specified in (ii) апа К = (ki, ..., kp) is defined in (9.6.9). 


In the complex p x p matrix-variate gamma case, r; is as defined in (9.6a.3) and the 
expression for $; 1 (4j) given in (ii) remains the same with the exception that m; in the 
complex case is т; = a — p +r j. Then, in the complex domain, the density of А. p, denoted 
by Л is the following: 


Theorem 9.6a.1. When m; = «œ — p + rj is a positive integer, where г; is defined 
in (9.6a.3), the density of the smallest eigenvalue of the complex matrix-variate gamma 
distribution is the following: 


fipAp)darp 
= ёк LA ge 
лл T æ 2u2+1 (m; — ил +m — из)! Рака? 
(my = ш tere + тр-1)! m|—pq т рр Шр] "Ре Pip, 


x 
(p — Drit (mi pi 4 mpi — fp)! 27) 
(9.6a.5) 


for 0 < Ар < oo, where 


rPI) 


Lor Y. 


PG), (a) K Pisses Fk 


Note 9.6.1. In the complex Wishart case, œ = т where m is the number of degrees of 
freedom, which is a positive integer. Hence, Theorem 9.6a.1 gives the final result in the 
general case for that distribution. One can obtain the joint density of the p — j + 1 smallest 
eigenvalues from $;..1 (^j) as defined in (ii), both in the real and complex cases. If the 
scale parameter matrix of the gamma distribution is of the form a7 where a > 0 is а 
real scalar and / is the identity matrix, then the distributions of the eigenvalues can also 
be obtained from the proposed procedure since for any square matrix B, the eigenvalues 
of aB are a vj's where the v;'s are the eigenvalues of B. In the case of real Wishart 
distributions originating from a sample of size n from a p-variate Gaussian population 
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whose covariance matrix is the identity matrix, the 4;’s are multiplied by the constant 
a = 5. If the Wishart matrix is not an estimator obtained from the sample values, they 
should only be multiplied by a = i in the real case, no multiplicative constant being 


necessary in the complex Wishart case. 


9.6.5. Density of the largest eigenvalue A, in the real matrix-variate gamma case 


Consider the case of m; = a — ZH + p — k; being an integer first. Then, in the real 
j 2 p i 8 8 
case, m j is as defined in (9.6.10). One has to integrate out Àp, ..., А in order to obtain 


the marginal density of л. As the initial step, consider the integration of À p, that is, 


Ар-1 
Step 1 integral : J i e Pdi, 
Ap=0 
D = Ар — À а 
= [-A5"e ^а] 1 аер [-т,!е А» 1 
Mp m ! 
= тр! — = ас (0) 
cb (Mp — Vp)! 
Now, multilying each term by Nee теры and integrating by parts, we have the second 


step integral: 


Step 2 integral 


Ар-2 Е i " Mp ть! Ар-2 mp—vp+m 1 2X 
= = ° p p E E 
== mp! | А1 е Рах р] у ; ра I p e Pld р] 
Ap—1=0 vp=0 (mp ш ур)! Ар-1=0 
Тр-1 ji ! 
Е Ip.]—Vp-] — 
= Mp!mp-\!—mpy! у or Ap е эр 
vp-1=0 (т p-1 — Vp-1)! 
тр 
>. тр! (Mp — vp тр-1)! 
Mp — vp)! 2"»-"ptmp-i 
vp=0 ( p p) 
Mp Mp—VptM p-1 
п » m p! (mp — vp + mp4)! тр—Ур+тр-1—Ур-1„—2Ху—2 
—2 d 
vp=0 (mp — vp)! =: 2"p- Yt (m — vp mpi — vp)! ” 


(ii) 


At the j-th step of integration, there will be 2/ terms of which 2 = 2/—! will be positive 
and 2/—! will be negative. All the terms at the j-th step can be generated by 2/ sequences 
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of zeros and ones. Each sequence (each row) consists of j positions with a zero or one 
occupying each position. Depending on whether the number of ones in a sequence is odd 
or even, the corresponding term will start with a minus or a plus sign, respectively. The 
following are the sequences, where the last column indicates the sign of the term: 


0 0 + 000 + 1 0 0 — 
O0 + 01l- Ай 0 е eg Б 
Step 1: | Аб Step 2: g _> хер: уо, ер 3:1 4 g + 
11+ 011+ 111l- 

0000+ 0100- Up ub Do 1100+ 
cag 0007075. es QVE OD че, ДШ a de pde 
dU o 011 pO a Oe ae. ae. T 
Or ig ВО ie ДЯ в 


All the terms at the j-th step can be written down by using the following rules: 


(1): If the first entry in a sequence (row) is zero, then the corresponding factor in the term 
is mp! ; 
(2): If the first entry in a sequence is 1, then the corresponding factor in the term is 


! i Б . Ир. a Б . . 
2 dm Т9 ог this sum multiplied by he 1 "P e—-1 if this 1 is the last entry in the 


sequence; 

(3): If the r-th and (r — 1)-th entries in the sequence both equal zero, then the corresponding 

factor in the term is mp—;+1!; 

(4): If the r-th entry in the sequence is zero and the (r — 1)-th entry is 1, then the cor- 
(n, —14-m p-r41)! 

(gpa tly prt? 


responding factor in the term is where n,_, is the argument of the 


denominator factorial and (n, 1) ee is the factor in the denominator corresponding 
to the (r — 1)-th entry; 


(5): If the r-th entry in the sequence is 1 and the (r — 1)-th entry is zero, then the cor- 


p—r —г+1! 1 ipli 
Mp—r+l Mp—r+l ; or this sum multiplied by 


responding factor in the term is 2 Laco Gigs cac 
— = p-r p-r 


Ар Т PPH е—Ар-е if this 1 is the last entry in the sequence; 


(6): If the r-th and (r — 1)-th entries in the sequence are both equal to 1, then the corre- 
Dy p Epp (n, — 1 m pp)! 

Vp-r41—0 (nr—-1 +1) 777 (n, 1E m p—r+1—Vp—r+1)! 
multiplied by а rrr Tportle- (10А if this 1 is the last entry in the sequence, 
where п,—1 and 7,1 are defined in rule (4). By applying the above rules, let us write down 
the terms at step 3, that is, j = 3. The sequences are then the following: 


sponding factor in the term is » or this sum 
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i | (iii) 
1 a= 


The corresponding terms in the order of the sequences are the following: 


Step 3 sequences: 


Ó—— о о 
oro 


0 1 0 
1 1 0 
0 ' 1 1 
1 1 1 


oocoeo 


Step 3 integral 


M p—2 m ! 
ES y | p-2: M p—2—Vp-2 —Ap.3 
= mpy!my-i!mp-»! — mp! mp_1! Gis ee p-3 ene 
Vp—2=0 р— р—27' 
mp-| 
i тр]! (mp-| — vp-1 + mp—2)! 
— Ир: 25 (m EN ! 2m p-|—V —1+тр—2+1 
lo Ut p-1 — Vp-1)! TUE i 
Vp—-1=0 
mp-| ! m p—1—Vp—1+M p—2 ! 
(m p-1)! (m p-1 — Vp-1 a5 m p—2)! 
+ т! у ( у! 2vp-2tl ! 
Mp—1 — 05-1): Рт, Mp—1 — Vp— M p—2 — 0-2): 
vp-1=0 ` P : pai vp—2=0 ( p= р-1 + Mp2 5 2) 
m —v +m —v 
х А P р—1 p-2 gis 2X p-3 
p- 
Mp 
тр! (Mp — vp тр)! | 
— (Mp — vp)! 2mMp—Vptmp-ı Д0 
Vp=0 
Mp ! ! mp ! 
+ у` ть. (mp — Vp + тр-1)! 26 Mp-2: Mp-2—Vp-2 es 
My — Vp)! 2 p—Yptmp-itl ть—2— 5:5) P73 
vc p p) "i p-2 p 2) 
mp Mp—Vp+Mp-1 


(My — v5)! (My — vy + ть — vp 1)! 
р р р р р р р 


Vp-1=0 


(mp — Vp t Mp-1 — Vp-1 + Mp—2)! 


3M p—Vp tM p—-1—Vp-1+M p_2 
Mp m p—Vpd-m p-1 
(m, — vy)! 2p-VFl(m — vy тр — Vp_1)! 
vp=0 P P vp-1=0 P p P P 


Im p—Vp tM p_1—Vp—1+M p_2 
di c. Р 4 (Mp — Vp + Mp_1 — Vp-1 + тр-2)! 


—241 
Jem 3"»-2* (mp — vp + mp1 — Vp-1 + mp-2 — vp-2)! 
p-2— 


x uni M 1—Vp-1+Mp-2—Vp 28 ЗАр—3. (iv) 
The terms in (iv) are to be multiplied by 2108 36 ^03 to obtain the final result if we are 
stopping, that is, if p = 4. The terms in (iv) can be verified by multiplying the step 2 
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integral by а 2e ^» and then integrating (ii), term by term. Denoting the sum of the 
terms at the j-th step by V; (5j), the density of the largest eigenvalue А, denoted by 
fii (a. is the following: 


Theorem 9.6.2. When т; = a — pti + p — kj is a positive integer and үл; (5j) is as 
defined in the preceding paragraph, where К; is given in (9.6.10), the density of ^, in the 
real matrix-variate gamma case, denoted by f11(A1), is the following: 


р г 


л 
л)ал = ————— 
fu Ga4)dA4 INSTRU 


So (HD y, iO Ал e? dar, 0 A1 < оо, (9.6.12) 
K 


where it is assumed that mj is a positive integer. 


In the corresponding complex case, the procedure is parallel and the expression for 
wj(Ap—j) remains the same, except that m; will then be equal to о — p + rj, where rj 
is defined in (9.6a.3). Assuming that m ; is a positive integer and letting the density in the 
complex case be denoted by fu (41), we have the following result: 


Theorem 9.6a.2. Letting m; = a — p + rj bea positive integer and үл; (. ;) have the 
same representation as in the real case except that m; = а — p ^ rj, in the complex case, 
the density of X1, denoted by f11(X4), is the following: 


" rPI) 
fud = = У) Wp-1 Ar) AT e™™ day, 0 < Ay < оо. 
Ё.(р)Ё, да) 2, 2 ' 


(9.6a.6) 


Note 9.6.2. Опе can also compute the density of the j-th eigenvalue А; from Theo- 
rems 9.6.1 and 9.6.2. For obtaining the density of A ;, one has to integrate out Aj,...,Aj—-1 
and Ар, Ap—1,---,Aj41, the resulting expressions being available from the (j — 1)- 
th step when integrating A1,..., Aj; and from the (p — j)-th step when integrating 
Àp, Ap-ii escas Ad 


9.6.6. Density of the largest eigenvalue A; in the general real case 


By general case, it is meant that m; = a — ptt + p — kj is not a positive integer. In the 


real Wishart case, т ; will then be a half-integer; however in the general gamma case o can 


be any real number greater than B In this general case, we will expand the exponential 
part and then integrate term by term. That is, 
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: s Mp —Ap = (—1)’? Ap-1 m pvp 
Step 1 integral: - Ap e "dA, = у, "t m Ар dA, 
p~ vp=0 Pp: p~ 
Е З (—1)” 1 yen Р 
— ————À — 1 LI 
= vp! (mp+vpt+1) P 


Continuing this process, we have 


oo 


Ee __.4 S Ep І 


Step j integral = 
PS Ы 2 vp! mp-Fvp-cl Vp-1! Mp + Vp +Mp-1 + Vp-1 +2 
Vp=0 Vp—1=0 
e. (—1)”Р—/+! 1 
Vp—j+1=0 Vp-j41! Mp + Vp + mpl jap реј t J 
= Aj(Ap_j). (i) 


Then, in the general real case, the density of A1, denoted by f51(A1), is the following: 


Theorem 9.6.3. When m; = a — pl c p — kj is not a positive integer, where k; is as 
specified in (9.6.10), the density of the largest eigenvalue X, in the general real matrix- 
variate gamma case, denoted by р (Ал), is given by 


2 


yy 


foy(Ay)dA, = C D^ AL (1) Ape dA, О <А < осо, (9.6.13) 


-— —— 
ГГ (а) < 


where Д; (Ар ;) is defined in (ii). 


The corresponding density of А in the general situation of the complex matrix-variate 
gamma distribution is given in the next theorem. Observe that in the complex Wishart case, 
т j is an integer and hence there is no general case to consider. 


Theorem 9.6a.3. When m; = a — p +r; is not a positive integer, where г; is as defined 
in (9.6a.3), the density of à; in the complex case, denoted by f?1(41), is given by 


р—1) 


Ёл) = Yo” У Араа) К е da, 0-5 ж, 


л 
Ppt (a) < " 
(9.6a.7) 
where the Aj(Ap—j;) has the representation specified in (ii) except that m; = à — p rj. 
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9.6.7. Density of the smallest eigenvalue 2., in the general real case 


Once again, ‘general case’ is understood to mean that m; = o — a +p—kjis 


not a positive integer, where К; is defined in (9.6.10). For the real Wishart distribution, 
‘general case’ corresponds to m; being a half-integer. In order to determine the density 
of the smallest eigenvalue, we will integrate out А, ..., Ар-1. We initially evaluate the 
following integral: 


оо ho 
Step | integral: | le “day = Г(т +1) – J Ae dA, 
A1—À2 А=0 
оо 
(-D^' l mı+uı+1 ; 
== Г +1 = А 17-01 | 
СС 2 pi т +ш+1 ? o 


The second step consists of integrating out A» from the expression obtained in (7) multi- 
plied by А5 2е7% : 


Step 2 integral: 
ed us —]» 1 со 
Г (т +1) A» e dX; = 2 (CD / jee и, 
=з mm pal mı + hy + 1 À2—À3 


(-1)” 1 m2+p2+1 
wo! macguacl? 


= Г(тү-+Е1)Г(тә +1) — Fim, 4 1) > 


u2=0 

J C eae) 
m m m 
aco Pm + шл + 1) Dom 


оо оо mı+u4ı+m+u2+2 
—])^! 1 (—1)2 ie І 


( 
+ 2, pal m +u! 2. 


A pattern is now seen to emerge. At step j, there will Бе 2/ terms, of which 2/~! will start 
with a plus sign and 2/~! will start with a minus sign. АП the terms at the j-th step are 
available from the 2/ sequences of zeros and ones provided in Sect. 9.6.5. The terms can 
be written down by utilizing the following rules: 


ii) 
pa! т +p T mau +2 


(1): If the sequence starts with a zero, then the corresponding factor in the term is Г (mı + 
1); 

(2): If the sequence starts with a 1, then the corresponding factor in the term is 
y» QC L _ or this series multiplied by oe ТЕТ if this 1 is the last entry in the 


ш=0 д! miu 
sequence; 


(3): If the r-th entry in the sequence is a zero and the (r — 1)-th entry in the sequence is 
also zero, then the corresponding factor in the term is Г (т, + 1); 
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(4): If the r-th entry in the sequence is a zero and the (r — 1)-th entry in the sequence 
is a 1, then the corresponding factor in the term is Г (п; + m, + 1) where n,—1 is the 
denominator factor in the (r — 1)-th factor excluding the factorial; 


(5): If the r-th entry in the sequence is 1 and the (r — 1)-th entry is zero, then the corre- 
sponding factor in the term is 2:530 = Lr 7m IL т 01 this series multiplied by ae p 
if this 1 happens to be the last entry in the sequence; 

(6): If the r-th entry in the sequence is 1 and the (r — 1)-th entry is also 1, then the 


í : . = |)" s . 4s 
corresponding factor in the term is 295 ( T "RE A +1 Of this series multiplied by 


ТОСЕ fue e La | | 
а ЙТ" Т if this 1 is the last entry in the sequence, where n,— is the factor appearing 


in the denominator of the (r — 1)-th factor excluding the factorial. 


These rules enable one to write down all the terms at any step. For example for j = 3, 
that is, at the third step, the terms are available from the following step 3 sequences: 


000 + 1 0 0 — 
001 — 101 + 
010 —’ 1 1 0 + 
0 1 1 + 111 — 


The terms corresponding to the sequences in the order are the following: 
Step 3 integral 
= Г(ту + 1) (m + 1) (m3 + 1) 


el 1 qimstist! 
из! m3+u3+1 4 


-rm +Dr(m+1) У, 
из=0 


(1 


Г (m + u2 + тз + 2) 
uaXm» + u2 + 1) 


—Г(ту+1) >> 


u2=0 
оо m+u2+m3+u3+2 
(—1)5 i. 


о Y (—1)2 1 Y 


! 1 
Hon H2! т + pat 0 


из! т + рә + тз + из +2 


V C 
— a uap (tmm 070+ 1 
pos" 
оо оо 
(-1)^ 
Г 2. 
2 pato, + ua +1) ИИ DD 
11-0 из=0 


+из+1 
(—1)#3 м? из 
u3! (тз + из +1) 
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m palmi + u1 + 1) = palm + pa + m2 + Ho +2) 


x I'(mi + ш + тә + ио + ma + 3) 


оо 


Y rm EP 
m pan; + ua +1) unco ua!(mı + ші d mo + m2 + 2) 


«> 


из=0 


(—1)и3 Дааа Аав 


So, (iii) 
из! (mı + pa +m + ua d тз + из + 3) 
Then, the step j will be the following, denoted by ш; (А j..1): 


оо 


| cpa 
Oe. = id ; = —1)7 ibant 
wj.) = Гот D: (nj) = CD 2 lt +m FD 


mitit +mj+uj+j 


оо З 
=] Hj À. 
x Y = Ja = (iv) 
„оу Hj Omi + +т+иу+]) 
p 


Theorem 9.6.4. The density of Àp for the general real matrix-variate gamma distribu- 
tion, denoted by f?5 (Ар), is the following: 


p 


РАФа 2 1*€ 15-105) Ap’ е ^? dàp, O < Ар < оо, 


л? 
T (5) Lp(a) 
(9.6.14) 
where the wj (4.j..1) is defined in (iv). 


The corresponding distribution of А. p for a general complex matrix-variate gamma dis- 
tribution, denoted by 2 (Ар), is the following: 


Theorem 9.6a.4. In the general complex case, in which instance m; = a — p t rj is not 
a positive integer, r j being as defined in (9.6a.3), the density of the smallest eigenvalue Àp, 
denoted by ђ (Ар), is given by 


- тР\Р—1) 
fop(Ap)dap = == \ (—1)** Wp-1(Ap) Ap € ^? day, 0 < Àp < оо, 
Ip (PI p) ©К 24 p 


(9.6a.8) 
where wj(Aj+1) has the representation given in (iv) above for the real case, except that 
m;-—aoa-—pc-cr;j. 
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Note 9.6.3. In the complex Wishart case, œ = m where m > p — 1 is the number of 
degrees of freedom, which is a positive integer. Hence, in this instance, one would simply 
apply Theorem 9.6a.1. It should also be observed that one can integrate out A1,...,Aj—1 
by using the procedure described in Theorem 9.6.4 and integrate out A,,...,Aj41 by 
employing the procedure provided in Theorem 9.6.3, and thus derive the density of A; 
or the joint density of any set of successive A ;’s. In a similar manner, one can obtain the 
density of A; or the joint density of any set of successive A's in the complex domain by 
making use of the procedures outlined in Theorems 9.62.3 and 9.6a.4. 
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Chapter 10 A) 


. Р P | 
Canonical Correlation Analysis poe 


10.1. Introduction 


We will keep utilizing the same notations in this chapter. More specifically, lower- 
case letters x, y,... will denote real scalar variables, whether mathematical or random. 
Capital letters X, Y, ... will be used to denote real matrix-variate mathematical or random 
variables, whether square or rectangular matrices are involved. A tilde will be placed above 
letters such as x, y, X , Y to denote variables in the complex domain. Constant matrices 
will for instance be denoted by A, B, C. A tilde will not be used on constant matrices 
unless the point is to be stressed that the matrix is in the complex domain. The determinant 
of a square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute 
value or modulus of the determinant of A will be denoted as |det(A)|. When matrices are 
square, their order will be taken as p x p, unless specified otherwise. When A is a full 
rank matrix in the complex domain, then AA* is Hermitian positive definite where an 
asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will 
indicate the wedge product of all the distinct differentials of the elements of the matrix 
X. Letting the p x q matrix X = (x;;) where the x;;’s are distinct real scalar variables, 
dX mA. ^= dx;j. For the complex matrix X = Xi +iXə, i = JCD), where Xi 
and X» are real, dX = dX; лах». 

The necessary theory for the study of Canonical Correlation Analysis has already been 
introduced in Chap. 1, including the problem of optimizing a real bilinear form subject 
to two quadratic form constraints. This topic happens to be connected to the prediction 
problem. In regression analysis, the objective consists of seeking the best prediction func- 
tion of a real scalar variable y based on a collection of preassigned real scalar variables 
X1,..., Хк. It was previously determined that the regression of y on x1, ..., xg, or the best 
predictor of y at preassigned values of хі, ..., хк, is the conditional expectation of y at 
the specified values of x1, ..., xg, that is, E[y|x1, ..., хк] where E denotes the expected 
value. In this case, best is understood to mean ‘in the minimum mean square’ sense. Now, 
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consider the following generalization of this problem. Suppose that we wish to determine 
the best prediction function for a set of real scalar variables y;,..., yq, on the basis of 
a collection of real scalar variables x1, ..., xp, where p needs not be equal to q. Since 
individual variables are available from linear functions of those variables, we will convert 
the problem into one of predicting a linear function of y1, ..., yg from an arbitrary linear 
function of x1, ..., хр, and vice versa if we are interested in determining the association 
between two sets of variables. Let the linear functions be u = аху +--+ + opxp = a’ X 
with a^ = (0,..., 05) and X' = (x1,..., xy) and v = Ву +--+ + бу = В'Ү 
with 8^ = (81, ..., By) and Y’ = (yi, ..., Yq), where the coefficient vectors œ and В are 
arbitrary. Let us provide an interpretion of best predictor in the case of two linear func- 
tions. As a criterion, we may make use of the maximum joint scatter, that is, the joint 
variation in u and v as measured by the covariance between u and v or, equivalently, the 
maximum scale-free covariance, namely, the correlation between и and v, and optimize 
this joint variation. Given the properties of linear functions of real scalar variables, we ob- 
tain the variances of linear functions and covariance between linear functions as follows: 
Var(u) = a’ Уо, Var(v) = B'X»jB, Cov(u, v) = о X28 = В' Ха, Xj = X», 
where Z4; > О and X?» > О are the variance-covariance matrices of X and Y, re- 
spectively, and Z1? = 25, accounts for the covariance between X and Y. Letting the 


X А 5 г : 
augmented vector Z = | | and its associated covariance matrix be 27, we have 


Е X I" Cov(X) Cov(X, Y) — X11 X12 
лый H E Een X)  Cov(Y) | H [ | 


Our aim is to maximize a’ X15 = f'X»5ja. When the coefficient vectors œ and f аге 
unrestricted, the optimization of a’ X12 proves meaningless since the quantity a’ X15 
can vary from —oo to oo. Consequently, we impose the constraints, a’ 5710 = 1 and 
B' X22B = 1, to the coefficient vectors œ and В. Accordingly, the mathematical problem 
consists of optimizing a’ 57128 subject to a’ 41a = 1 and f' X558 = 1. 


Letting 


cca e (o Eye = = E En SS (i) 


where pı and o» are the Lagrangian multipliers, we differentiate w with respect to о and 
В and equate the resulting functions to null vectors. When differentiating with respect to 
В, we may utilize the equivalent form 6’ X2,;a = a’ 57128. We then obtain the following 
equations: 
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д 
ase О = Xf – рпа = О (ii) 
д 
ap" = О = X»a-—p»X»p-oO. (iii) 


On pre-multiplying (ii) by o and (iii), by 6’, and using the fact that a’ Хо = 1 and 
B' X55 = 1, one has o, = o» = p anda’ X126 = p. Thus, 


—pàXi Ly a O 
= 10.1.1 
| X5 E H И ( 
апа 
Cov(o' X, В'Ү) == at’ Xi» = В' Ха = р. (10.1.2) 


Hence, ће maximum value of Cov(o' X, B'Y) yields the largest p. It follows from (ii) that 
a= Ут Zo which, once substituted in (iii) yields 


[Zi Z4 Zi — p? X»]B- О > [55 Sa 95 Lie — p! Ilf = О. 


This entails that р? = A, an eigenvalue of В = xo Xa X "o 12 Or its symmetrized 


1 1 
form 527 X4 X a 212 X4 , and that f is a corresponding eigenvector. Similarly, by ob- 
taining a representation of В from (iii), substituting it in (ii) and proceeding as above, it 
is seen that р? = À is an eigenvalue of A = X ІХ 125 271 or its symmetrized form 
1 1 
A n 312 bm 2215 vis and that o is a corresponding eigenvector. Hence manifestly, all 
the nonzero eigenvalues of A coincide with those of B. If p < q and X» is of full rank 
р, then A > О (real positive definite) and B > О (real positive semi-definite), whereas 
ifq < p and 2721 is of full rank p, then A > О (real positive semi-definite) and B > О 
(real positive definite). If p = д and 272 is of full rank p, then A and B are both positive 
definite. If p < д and 272 is of full rank p, then one should start with A and compute all 
the p nonzero eigenvalues of A since A will be of lower order; on the other hand, if q < p 
and 25, is of full rank q, then one ought to begin with B and determine all the nonzero 
eigenvalues of B. Thus, one can obtain the common nonzero eigenvalues of A and B or 
their symmetrized forms by making use of one of these sets of steps. Let us denote the 
largest value of these common eigenvalues А = p? by Aq) and the corresponding eigen- 
vectors with respect to A and B, by аст and (уу, where the eigenvectors are normalized 
via the constraints o, X101) = | and Bay XB) = 1. Then, (uj, vi) = (o1) X, Bay Y) 
is the first pair of canonical variables in the sense that иј is the best predictor of vı and 
vı is the best predictor of иј. Similarly, letting pes = Àq) be the i-th largest common 


644 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


eigenvalue of A and B and the corresponding eigenvectors such that ou 1197) = І and 
Bij) E22 BG) = 1, be denoted by aj) and В(;), the i-th largest correlation between и = a’ X 
and v = f'Y will be equal to o i2) = pi) = Ja)», i = l,..., p, p denoting the 
common number of nonzero eigenvalues of A and B, and occur when u = u; = ay X 
and v — vj — Bay Y , uj and vj being the i-th pair of canonical variables. Clearly, Var(u;) 
and Var(v;), i = 1,..., р, are both equal to one. Once again, best is taken to mean “п the 
minimum mean square' sense. Hence, the following results: 


Theorem 10.1.1. Letting X, A, В, p, ас), Во), uj and vj be as previously defined, 


max [0 2158] = e) Zi2B0) = pay (10.1.3) 
o! 01, 8'5›28=1 


where осуу is the largest p or the largest canonical correlation, that is, the largest corre- 
lation between the first pair of canonical variables, u = o' X and v = В'Ү, which is equal 
to the correlation between u; and v, with Bo = A) the common largest eigenvalue of 
A and B. Similarly, we have 


min o X =a X = 10.1.4 
a! Xijo—l, D encor i26] (р) ¥12B(p) = P) ( ) 


where pp), which is the smallest nonzero value of p with Pep) = А(р), the common smallest 
nonzero eigenvalue of A and B, represents the smallest canonical correlation between u 
and v or the correlation between ир and vp 


This maximum correlation between the linear functions a’ X and @”У or the correlation 
between the best predictors иј and vı or the maximum value of p is called the first canon- 
ical correlation between the sets X and Y in the sense the correlation between иј and v, 
attains its maximum value. When p = | or q = 1, the canonical correlation becomes the 
multiple correlation, and when p — 1 and q — 1, it is simply the correlation between two 
real scalar random variables. The matrix of the nonzero eigenvalues of A and B, denoted 
by A, 15 A = diag(A(1),...,A(p)) when p < q and 2712 is of full rank p; otherwise, p is 
replaced by q in A. 


It should be noted that, for instance, the canonical variable B’Y such that В satisfies 
the constraint В’ 2228 = 1 is identical to Bx Uy such that b'b = 1 since В' X558 = 
b' x эое, b = ЬЬ. Accordingly, letting the A(;j's as well as A and В be as previ- 
ously defined, our definition of a canonical variable, that 15, u; = oy X and v; — Buy Y ; 
coincides with the customary one, that is, uy = d; A m ex where а; is the eigenvector 
with respect to В which is associated with Ау and normalized by requiring that ага = 1, 
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and v? = b; X PY where bj is an eigenvector with respect to B corresponding to Aj) 
and such that 5 Ж = 1. It can be readily proved that the canonical variables ит, . . . , u^, (or 
equivalently the u;'s) are uncorrelated, as Соу(и*, и?) = а а = 0 for 


i 5 j since the normed eigenvectors а; are orthogonal to one another. It can be similarly 
established that the v?'s or, equivalently, the v;'s are uncorrelated. Clearly, Соу(и*, и») = 
Cov(u;, uj) = 1 and Cov(v?, vj) = Cov(v;, vj) = 1. We now demonstrate that, fori Æ k, 
the canonical variables, и; and v; are uncorrelated. First, consider the equation A a = ^а, 
that 1s, 


zt =l 
Eq EXE Lad a= Aa. 


zs шы 
On pre-multiplying both sides by X, 25; X,,^, we obtain Bb = Ab where b = 
1 1 


M 2212; па. Thus, if A and a constitute ап eigenvalue-eigenvector pair for A, then 
A and b must also form an eigenvalue-eigenvector pair for B, and vice versa with 


= изе. Endy! b. py definition, Cov(u?, ъ= а РЭ: by where the vector 


b = 0 у? ж? ак, Ө being a pose constant such that the Euclidean norm of 
by is one. Note that since b, „bk = 0? а Аа, = 0a, ^q ак = 1, 0 must be egual to 


1/,\/Aq). Thus, by = х5? Dy Ei? aK/ J Ack) with Eo ak = ay and by = Elf. 
which is equivalent to (iii) with a = ag, В = Ba and p = pœ, that is, бу = 


_l 1 
x 221 (k)/P(k)- Then, Cov(u?, Up) = а, Жуу” xu E. Xu Eii aA) = the (i, k)th 
element of diag(A(1), .... 4(p))/ / Ж), Which is equal to 0 whenever i 5 К. As expected, 
Gii) = 
Cov(ur, vj) = VAG) = Ply, and Cov(uj, vk) = Oj) 12 Воо) = a Ei Ey уо (к) / P(k) 


—1 —1 
= aX Dan DD ae JO = Соу(иў, ор) for i, К = 1,..., p, assuming that 
р < q and 2702 is of full rank; if p > q and 22 is of full rank, A and B will then share q 
nonzero eigenvalues. 


10.1.1. An invariance property 


An interesting property of canonical correlations is now pointed out. Consider the fol- 
lowing nonsingular transformations of X and Y: Let X; = A,X and ү = BjY where 
A, is a p x p nonsingular constant matrix and Bj is аф х q constant nonsingular matrix 
so that [41| Æ O and |B,| 4 0. Now, consider the linear functions o/^ X, = o/A,X and 
В'Ү = B’ BY whose variances and covariance are as follows: 


Var(a' Yi) = Var(a’ А | Ү) = a’ Ai Li Aja, Var(B’Y1) = Var(B' B1Y) = B'Bi E» Bi 
Cov(a' X1, Ву) = а А Zio BUB = B' Bi X»1Aqa. 
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On imposing the conditions Var(a’X;) = 1 and Var(f’Y,) = 1, and maximizing 
Соу(о' Ху, B’Y,) by means of the previously used procedure, we arrive at the equations 


Ai X12B)B — p1A1 Xii A18 = 0 (iv) 
— 2B, Z5 Bi B + Bi Xz, Aja = 0. (v) 


On pre-multiplying (iv) by o and (v) by 8’, one has о = p2 = p, say. Equations (iv) and 
(v) can then be re-expressed as 


E А212 В| | H u H = 
BjE4A,  —pBiXpoBj||B| |0 
Aj O — 0211 X12 А, О a _ О 
[o j| ta -allo jl] o] 


Taking the determinant of the coefficient matrix and equating it to zero to determine the 
roots, we have 


А\ O — 0211 X12 А, O Er 
le Pall s E NE g|-- (10.1.5) 
=p% Ly 
= (). 10.1.6 
| à» =p 2292 ( 


As can be seen from (10.1.6), (10.1.1) and (10.1.5) have the same roots o, which means 
that the canonical correlation p is invariant under nonsingular linear transformations. Ob- 
serve that when 2712 is of full rank p and p < q, pa), ---, pp) corresponding to the 
nonzero roots of (10.1.1) or (10.1.6), encompasses all the canonical correlations, so that, 
in that case, we have a matrix of canonical correlations. Hence, the following result: 


Theorem 10.1.2. Let X, a p x 1 vector of real scalar random variables xi, ..., xy, and 
Y, aq x 1 vector of real scalar random variables yi, ..., уд, have a joint distribution. 
Then, the canonical correlations between X and Y are invariant under nonsingular linear 
transformations, that is, the canonical correlations between X and Y are the same as those 
between A,X апа B,Y where |A,| # 0 and |B,| Æ О. 


10.2. Pairs of Canonical Variables 


As previously explained, Ас) which denotes the largest eigenvalue of the matrix 


1 1 
А = Xu EE 251 or its symmetrized form 2, ,° ux X21 24^, as well the largest 
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1 1 
eigenvalue of B = is Xa D7 Уз or 3 Yn Ly EIE, also turns out to be equal 
to Pays the square of the largest root of equation (10.1.1). Having evaluated А), we com- 
pute the corresponding eigenvectors ост) and (уу and normalize them via the constraints 
oa) X141) — ] and Bay X» = 1, which produces the first pair of canonical variables: 
(и, v1) = (o X : бу ). We then take the second largest nonzero eigenvalue of А ог 
B, denote it by àq), compute the corresponding eigenvectors ос) and (у and normal- 
ize them so that «б Xiao = 1 and Во 2222) = 1, which yields the second pair of 
canonical variables: (u2, v2) = (o9) X | BoY ). Continuing this process with all of the 
p nonzero eigenvalues if p < q and X12 is of full rank p, or with all of the q nonzero 
eigenvalues if q < p and 2», is of full rank q, will produce a complete set of canonical 
variables pairs: (uj, vj) = (o X, Bay Y). i=1l,..., porq. 


Since the symmetrized forms of A and B are symmetric and nonnegative definite, 
all of their eigenvalues will be nonnegative and all nonzero eigenvalues will be positive. 
As is explained in Chapter 1 and Mathai and Haubold (2017a), all the eigenvalues of 
real symmetric matrices are real and for such matrices, there exists a full set of orthog- 
onal eigenvectors whether some of the roots are repeated or not. Hence, a pX will be 
uncorrelated with all the linear functions aX r= 1,2,...,j— 1, and Ву will be 
uncorrelated with Bor) Y, r=1,2,..., j — 1. When constructing the second pair of canon- 
ical variables, we may impose the condition that the second linear functions a’ X and В'Ү 
must be uncorrelated with the first pair aa X and B, n respectively, by taking two more 
Lagrangian multipliers, adding the conditions Cov(o' X, aX ) = a’ 5а) = 0 and 
B' X»; = 0 to the optimizing function ш and carrying out the optimization. We will 
then realize that these additional conditions are redundant and that the original optimizing 
equations are recovered, as was observed in the case of Principal Components. Similarly, 
we could incorporate the conditions a’ 270) = 0, r= 1,..., j — 1 when constructing 
осу) and similar conditions when constructing (у). However, these uncorrelatedness con- 
ditions will become redundant in the optimization procedure. Note that Ап) = Pd) is the 
square of the first canonical correlation. Thus, the first canonical correlation is denoted by 
pq). Similarly A) = £e is the square of the r-th canonical correlation, r = 1,..., р 
when p < д and 27 is of full rank p. That is, pa), ..., Pcp), the p nonzero roots of 
(10.1.1) when р < q and Zi» is of full rank p, are canonical correlations, ро) being 
called the r-th canonical correlation which is the r-th largest root of the determinantal 
equation (10.1.1). If p < q and X» is not of full rank p, then there will be fewer nonzero 
canonical correlations. 
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Example 10.2.1. Let Z = Е be a5 х 1 real vector random variable where X is 3 х 1 


and Y is 2 x 1. Let the covariance matrix of Z be 27 where 


31010 
12001 310 

Bejo ai jaf RAAEN 
10121 т? 0 0 3 
01112 


1 0 
2 1 101 
X» = | |! X2 = [ | | X) = : 


Construct the pairs of canonical variables. 


Solution 10.2.1. We need the following quantities: 


6 —3 0 
1 = 1 
Ste Шел кке xil. 
22 4 =] 2 11 15 
0 05 
1f 2 -1]f10 1]. 1[2 -11 
1 = — 
zaza 1:32 F 1 des 2 |, 
pi eee OP О a G8 
ry le = ү; EM BE LIE UE 
o os]|r1] Pis s 


A= Xj Zi Zl Da, 

6 —3 
= = 1 2 —1 1 
Bas, Sil E e —3 9 
45 | - 
Е 1 20 —10 

2451-7 26|` 

Let us compute the eigenvalues of B since itis 2 x 2 whereas A is 3 x 3. The characteristic 


equation of 45B is (20 —2)26—2)— 70 = 0 = A? — 464. + 450 = 0. The roots 
are Aj = 23 + 479, А = 23 — V79. Hence, the eigenvalues of В are pj = A that 
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is, 01 = жо; p = ке We have denoted the second set of real scalar random 


variables by Y, У’ = [yi, y2]. An eigenvector corresponding to o, is available from 
(B — о\Г)Ү = О. Since the right-hand side is null, we may omit the denominator. The 
first equation is then (—3 — V79)y1 — 10y2 = 0. Taking yı = 1, yo = E + /79). It is 
easily verified that these values will also satisfy the second equation in (B — pı D)Y = О. 
An eigenvector, denoted by 61, is the following: 


1 
Pis [взу]: 


We normalize Ві through В! 2752 Ві = 1. To this end, consider 


; m 1 2 1 1 
B, 22261 = U1, -109 + A/79)] l J a лу] 


1 
= 5509 — 2/79). 


Hence a normalized Ві, denoted by Вт), and the corresponding canonical variable vı are 
the following: 


1 
Ва = Ка V/79) yj]. 


5 1 | 5 
————M—— y vy =-— 
J19 — 2/19 m +V] ! Jo 20779 


The second eigenvalue of В is p? = 45 (23 — 4/79). An eigenvector corresponding to p2 
is available from the equation (B — р Г)Ү = О. The second equation gives —7y; + (3 + 
4/19) y) = 0. Taking y2 = 1, ут = iG + ./79). Hence, an eigenvector corresponding to 
p2, denoted by f», is the following: 


| [164 379) 
о [om] 


We normalize this vector through the constraint Bs 222 B2 = 1. Consider 


1 la 
вањ = [76 0. v] 5| 617 
1 
= 35 16 + 26% 79). 


Hence, the normalized eigenvector, denoted by Во), and the corresponding canonical vari- 
able v2 are 


7 Е) 7 1 
B -— |! , v2 = — | -8 + \/79)у + уз]. 
OU F316 + 26/79 1 * 7316 +.26/79 7 TS 
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We will obtain the eigenvectors resulting from (A — p,/)X = О from the eigenvector В] 
instead of solving the equation relating to A, as the presence of the term 3 + 4/79 can make 
the computations tedious. From equation (ii) of Sect. 10.1, we have 


1 
о = У Zpfi 
pı 


| NA5 (1) s E: | 1 | 
ШҮ + \/79 SES 5 5 E + /79) 
JB 6+ 33 + V79) 


= —3 — (3 + 4/79) 
15(/23 + 779) | 5- 203+ 4/79) 


Let us normalize this vector by requiring that o' 2310 = ] or o X110 = 
1 —1 Р 

Bi Eng У12Ві = yr say: 

1 


2 45 1 \ 0 ] 
= 1, – —(3 + v79 
a Bm TOM | 011 
6 —3 | 
x|-3 9 
5 s ET + уу) 
45 1 11 2 1 
= ].— / 7 
О ot p 16 " анто 
45 


= 553 + 114/79]. 
15034 77905) Ы 


Hence, the normalized ој, denoted by a1), is the following: 


" 5 6+ 334 V79) 
P — qs (553 611/79) L 5- 3G + 79) 


so that the corresponding canonical variable is 


5 3 9 
inus E i mu tu VT) |xı = Е tu VT) bo 


xui 


+[5- 56 +v) |]. 
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Now, from the formula a2 = LE a 212 В2, we have 


ӨНЕ а a бее) 
(23 — 4/79) P^ | s s | 
„45 $(3 + 4/79) – 3 


3 
= 2 | уле V79) +9 
15(/23 - /79) | 334/79) +5 


Let us normalize this vector via the constraint o У 1102 = lor 


1 _ 
05 1102 = gan 2728 Eus 


2 
say. Thus, 
д 45 1 B ш 
= 3 + v79), 1 7 
г Sa Um UR. | 2 14 1 
45 1 
= 1738 + 944/79) |, 
Some! E ] 


and the normalized vector a2, denoted by ооо), is 


, 5(3 + 4/79) – 3 
mL —3(3+ /79) +9 |, 
UP T5 1738 + 94/79) | 864 /79) +5 


so that the second canonical variable is 


у= Tarr ae dox 79) = 3 |х. = B + 379) + Jm 


a2 


(2) 


и 


$ B + 379) + JE 


Hence, the canonical pairs are (u1, v1), (u2, v2) where и; is the best predictor of v; and 
vice versa for j = 1,2. The pair of canonical variables (u2, v2) has the second largest 
canonical correlation. It is easy to verify that Cov(u;, u2) = О and Соу(у, v2) = 0. 
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10.3. Estimation of the Canonical Correlations and Canonical Variables 
Consider a simple random sample of size n from a population designated by the (p 4- 


q) х l real vector D . Let the (corrected) sample sum of products matrix be denoted by 


Y 


-|5u Sp , 51115 рх p, 822154 xq, 
$531 S22 


where 511 is the sample sum of products matrix corresponding to the sample from the 
subvector X, whose (i, j)th element is of the form Da pik — Xi)(xjk — xj) with the 
matrix (xj) denoting a sample of size n from X, S22 is the sample sum of products matrix 
corresponding to the subvector Y and 15 12 is the sample covariance between X and Y. 
Thus, denoting the estimates by hats, the estimates of X11, X22 and £1» are E 1 = 151, 
x55 = lo and $ 12 = 1.512, respectively. These will also be the maximum likelihood 


— п 
estimates if we assume normality, that is, if 


Mu | (10.3.1) 


X 
Z= n ~ Мб, X), 20, X= E pa 


where 5 = Cov(X) > О, X» = Cov(Y) > О and Xj» = Cov(X, Y). For the 
estimates of these submatrices, equation (10.1.1) will take the following form: 


=0 (10.3.2) 


IE» M12 
i931 tEn 


—181] 12 
521 —t S22 


where t is the sample canonical correlation; the reader may also refer to Mathai and 
Haubold (2017b). Letting 6 = t be the estimated canonical correlation, whenever p < 
q, t? is an eigenvalue of the sample canonical correlation matrix given by 

xmi d & Ж. uen ad —1 —1 —1 

iu Sny Su By? = Sr SS Sas = Re RoR RaRa. (103) 
Note that we have chosen the symmetric format for the sample canonical correlation ma- 
trix. Observe that the sample size n is omitted from the middle expression in (10.3.3) as 
it gets canceled. As well, the middle expression is expressed in terms of sample correla- 
tion matrices in the last expression appearing in (10.3.3). The conversion from a sample 
covariance matrix to a sample correlation matrix has previously been explained. Letting 
S = (sij) denote the sample sum of products matrix, we can write 


; Ri, К 
$= S5. 5 = dig 1 Sie. Ro 6mm. Ri]. 


Canonical Correlation Analysis 653 


where rj; is the (i, j)th sample correlation coefficient, and for example, Rj, is the p x p 
submatrix within the (p + q) x (p + q) matrix R. We will examine the distribution of t? 
when the population covariance submatrix 2/2 = О, as well as when X1» Æ О, in the 
case of a (p + q)-variate real Gaussian population as given in (10.3.1), after considering 
an example to illustrate the computations of the canonical correlation o resulting from 
(10.1.1) and presenting an iterative procedure. 


Example 10.3.1. Let X and Y be two real bivariate vector random variables and Z — 


a Consider the following simple random sample of size 5 from Z: 


1 2 2 0 0 
2 0 1 1 1 
—1 1 —1 0 1 


Construct the sample pairs of canonical variables. 


Solution 10.3.1. Let us use the standard notation. The sample matrix is Z = 
[21,..., Z5], the sample average Z = i[Zi +- -- + Z5], the matrix of sample averages is 
ZI; Z], the deviation matrix is Z4 = [41 — Zi Zs — Z] and the sample sum 


of products matrix is 5 = Z4 7. These quantities are the following: 


1 2 2 0 0 1 
2 0 1 1 1 Е 1 
= 1-1 0 1-1]|*^"|o |: 
=f) 1—1 0 1 0 
0 1 3 -1 -1 4 —1 -1 " 
1-1 0 0 0 "EP 2—2 
La = [т 0 fat eL 2 43 
EE MT ME 1 SS 5 4 | 


where, as per our notation, 


d =i ЕЕ 
= 1 JEDE 2 


-1 2 4 —3 
$21 = Е 2 $22 — Е | 
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We need to compute the following items: 


E: LIE 1 Е 1 [4 3 
1 
Tel ; al Jc 
1|2 —1 _! — 
=] PEN d 
2 1 [4 3 ыр 2 EE 
1 — = 


The matrices A and B are then the following (using the same notation as for the population 
values for convenience): 


0-24) к. ра 
cosi pecie, аа. Em 
а TAD ER) E Exe El | | 


ЕЕ 1[-7 21[0 -4] 2 [7 s 
== 1 1 ais PE i=, Se 
В = 32 821541 912 = 3 E 2 É EE 72 Ё 7 2 | 


If the population covariance matrix of Z is 27, an estimate of 27 is 5 where S is ће sample 
sum of products matrix and n is the sample size, which is аво the maximum likelihood 
estimate of 27 if Z is Gaussian distributed. Instead of using 5 >, we will work with S HE 
the normalized eigenvectors of $ and > = are identical, aloughi the eigenvalues of 5 аге 1 
times the eigenvalues of S. 

The eigenvalues of A are 2 times the solution of (14 — А) (16 — А) – 28 = 0 = 
Ay = 15 + 29, А = 15 — 429, so that the eigenvalues of A, denoted by А and A15, 
are л = (5)015 + /29), A12 = Z5 — X 29). The eigenvalues of В are E times the 
solutions of (7 — v)(23 — v) + 35 = 0 > у = 15 + 29, v = 15 — 429, so that the 
eigenvalues of B, denoted by v2; and v22, are vo; = 45 +429), 2 = 5 (15 — /29), 
which, as expected, are the same as those of A. Corresponding to А, an eigenvector from 
A is available from the equation 


Иш (15 + v29) 4 | M Е H 
7 16 — (15 + /29)} | x2} 10] 


deleting = 2. from both sides. Thus, one solution is 


_ үн + “| 
à | 
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Let us normalize this vector through the constraint o $110, = 1. Since 


4 4 -1]/—=] 2 
M = (Se 1+у29 | = —(116 — 114/29), 
a S114 EE | || | | =a ) 


the normalized eigenvector, denoted by от), and the corresponding sample canonical vari- 
able, denoted by u1, are 


4 
1+ 29 


[i 7 4 
æa) = and u, = | X] ex ; 
J/2V116 — 114/29 | 1 | J24/116 — 114/29 L1 + v29 


The eigenvalues of B are also the same as v; = 15+/29, v = 15— у 29. Let us compute 
an eigenvector corresponding to the eigenvalue v; obtained from B. This eigenvector can 
be determined from the equation 


—8 — /29 5 yi} |0 
ey 8—429|]|»] |0 
which gives one solution as 


Е l 5/(8 + Sei 
Ві = 1 | 


Let us normalize under the constraint Pi S2261 = 1. Since 


5 4 -3]| | 2 
152281 = [——=, 1 $1429 | = —(116 — 11V29), 
the normalized eigenvector, denoted by В), and the corresponding canonical variable 
denoted by vı, are the following: 


5 
8-- 29 


7 7 5 
SS еа and v; = ————————— S 
/24/116 — 114/29 1 МЭ2/116 — 114/29 L8 + v29 


Therefore, one pair of canonical variables is (u1, v1) where u1 is the best predictor of vı 
and vice versa. Now, consider A» = 15 — 4/29 and v? = 15 — 4/29. Proceed as in the 
above case to obtain the second pair of canonical variables (u2, v2). 


Ba) 1 »| ; 
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10.3.1. An iterative procedure 


Without any loss of generality, let р < q. We will illustrate the procedure for the 
population values for convenience. Consider the matrix A as previously defined, that is, 
А = 5 ux Bx 2721, and р, the canonical correlation which is a solution of (10.1.1) 
with o? = A where А is an eigenvalue of A. When p is small, we may directly solve 
the determinantal equation (10.1.1) and evaluate the roots which are the canonical cor- 
relations. We are now illustrating the computations for the population values. When p is 
large, direct evaluation could prove tedious without resorting to computational software 
packages. In this case, the following iterative procedure may be employed. Let А be an 
eigenvalue of A and о the corresponding eigenvector. We want to evaluate А = p? and 
a, but we cannot solve (10.1.1) directly when p is large. In that case, take an initial trial 
vector yo and normalize it via the constraint Yo Утуо = 1 so that 00 +110 = 1, a being 


the normalized yo. Then, ap = Tian” Now, consider the equation 
0 
Aa = у. 


If оо happens to be the eigenvector от) corresponding to the largest eigenvalue Ас) of 
A then Aag = А(1)0(1); Ago = yi > y,2un = A9) Xie) = ху ѕіпсе 


o4) 11061) = 1. Тһеп Pay = Aa) = J yi 11у. This gives the motivation for the it- 
erative procedure. Consider the equation 


1 І 
A Qi = у, i = —————у,‚ї=0,1,... (0) 


ү Zvi 


Continue the iteration process. At each stage compute 6; = о 210; while ensuring that ô; 
is increasing. Halt the iteration when y; = y;— approximately, that is, when a; = 0j. 
approximately, which indicates that y; converges to some vector y. At this stage, the 
normalized y is a1), the eigenvector corresponding to the largest eigenvalue àq) of A. 
Then, the largest eigenvalue A(1) of A is given by Aq) = /y/X11y. Thus, as a result of 
the iteration process specified by equation (i), 


oe Qj = a1) and + pas ула) == АХ). (il) 


These initial iterations produce the largest eigenvalue Ап) = Pd) and the corresponding 
eigenvector от). From (10.1.1), we have 


Е Ls "T 
Dp Lue =pp> 7з ne = В. (iii) 
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Substitute the computed рст) and ос) in (iii) to obtain Вт), the eigenvector corresponding 
1 1 


to the largest eigenvalue àq) of В = By Xa X T X12 E. This completes the first stage 
of the iteration process. Now, consider A2 = A — ADUA: Observe that ANA) isa 
p x p matrix. In general, we can express a symmetric matrix in terms of its eigenvalues 
and normalized eigenvectors as follows: 


al 1 
A = Xj! p Ej Ey Xj) = Aqyaajo HADAA +--+ Apt — v) 


as explained in Chapter 1 or Mathai and Haubold (20172). Carry out the second stage of the 
iteration process on A» as indicated in (i). This will produce the second largest eigenvalue 
Ао) of A and the corresponding eigenvector оо). Then, compute the corresponding (о) 
via the procedure given in (iii). This will complete the second stage. For the next stage, 
consider 
Аз = Аз — Agata = А — ADADA) = ADAAN) 

and perform the iterative steps (i) to (iv). This will produce Аз), оз) and (зу. Keep 
on iterating until all the p eigenvalues А), ..., Аср) of A as well as aj) and (уу, the 
corresponding eigenvectors of A and B are obtained for j — 1,..., p. 


In the case of sample eigenvalues and eigenvectors, start with the ~ matrices 


ao o anla aoa ami 
Â= Бо орле = к? RR RE T 
А ——— Ае =l Ai 
Bex Sn $5 $5 £7 = КУ? Ra R! 12 К, x 
Carry out the iteration steps (i) to (iv) on A to obtain the sample eigenvalues, denoted by 
Aj) = tj J =1,..., p for p < q, where taj) is the j-th sample canonical correlation, 
and the corresponding eigenvectors of A denoted by асуу as well as those of B denoted by 
bij). 
Example 10.3.2. Consider the real vectors X’ = (x1, x2), Y' = (yi, у, уз), and let 
Z’ = (X', Y^) where х], х2, y1, y2, уз are real scalar random variables. Let the covariance 
matrix of Z be X > О where 


X Xu Xp / 
Cov(Z) = Cov = У = ‚Ур = Ds, 


Cov(X) = X11, Cov(Y) = X», Cov(X, Y) = Zi», 


with 
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Consider the problem of predicting X from Y and vice versa. Obtain the best predictors 
by constructing pairs of canonical variables. 


Solution 10.3.2. Let us first compute the inverses X n E and X nu 12, x Xna. 
We are taking the non-symmetric form of A as the symmetric form requires more cal- 
culations. Either way, the eigenvalues are identical. On directly applying the formula 
Cts Ta x [the matrix of cofactors]’, we have 


2 —1 0 
1| 2 —1 
=] —1 
Xu za jl à-|-l 1 OF, 
0 0 1 
1| 2 -1||1 1 1| Lil 3 1 
1 == = 
a Zo-i|] J Í -1 ТЕГ -3 1 
2 —1 0||1 1 3 
EXjx4-2|-1 1 0||1 -1|24]|0 -2 
0 0 1{||1 1 1 
Thus, the non-symmetric forms of A and B are 
1 3 
Е _ 1{1 3 1 1/2 —2 
1 1 
жез ғ = 3. -3 ] : E SHE al 
ee el il: lr x yp] 9 —9 = 
В = Xy 21У Ўв = 5 0 —2 1-3 1173 —2 6 —2 
1 1 2 0 2 
. i 2-rX -2 
Let us compute the eigenvalues of 3A. Consider 2 10 — X =0> 0—AXI0— 


A) + 4 = 0, which gives 


/ 2a 
jg fee I чыч) = 6-4 24/5, 6 — 24/3, 


the eigenvalues of A being àq) = 2 + 573, хоу = 2 = 5 МЗ. These аге the squares 
of the canonical correlation coefficient p resulting from (10.1.1). Let us determine the 
eigenvectors corresponding to Ат) and A,2). Our notations for the linear functions of X 
and Y are u = aX and v = В'Ү; іп this case, o = (oi, оо) and В’ = (Ві, B2, pa). 
Then, the eigenvector o, corresponding to Лт) is obtained from the equation 
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2 2 2 
КАЛЕ EE 
| 2 10 _ 24 23)| Lao | ~ Lo] 7 á 


— (2+ УЗ)о -a2 20 > a = 1, а = —Q + V3). 
Observe that since (i) is a singular system of linear equations, we need only consider one 


equation and we can preassign a value for о or o». Taking o; = 1, let us normalize the 
resulting vector via the constraint a’ 0 = 1. Since 


‚ 2 1 1 
о Упа = [1 —(2@+У/3)] ; Е л) 
= 12 +63 = у, 


the normalized a, denoted by ост), is 


«= —= | : |> u = tn - C+ Sol. (ii) 
Vn 1-02 + V3) М 


Now, the eigenvector corresponding to the second eigenvalue 1,2) is such that 
Pee „еШ [= 
р 9-o-d»Jla] Lo 
(—2 + /3З)о\ — 02 = 0 > о = 1, о = edu. 
Since 
/ = 2-1 1 
a dja = [1 —2 + 43] [ 2 E 
= 12 — 643 = у», 


the normalized o such that a’ 51а = 1 is 


1 1 1 
a) = —— > uy = —— [х1 + (-2 + У3)2]. (iii) 

еы E * л] Jn 
Observe that computing the eigenvalues of B from the equation |B — àq) Z| = 0 will be 
difficult. However, we know that they are Ас) and А2) as given above, the third one being 
equal to zero. So, let us first verify that ЗА (т) is an eigenvalue of З В. Consider 
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4 — (6 +273) —6 4 
3B — (6+ 2/3)[| = Е.) 6 — (6+ 24/3) E 
2 0 2 — (6 + 24/3) 
1 V3 1 
=2x2x2/-d4+¥3) -3 2 
1 0 (Ex 
1З 1 
—8|0 V3 3+/3 |=0. 
0 —/з —@-+ 3) 


The operations performed are the following: Taking out 2 from each row; interchanging 
the second and the first rows; adding (1 + 4/3) times the first row to the second row and 
adding minus one times the first row to the third row. Similarly, it can be verified that 
Ао) is also an eigenvalue of B. Moreover, since the third row of B is equal to the sum 
of its first two rows, B 1s singular, which means that the remaining eigenvalue must be 
zero. In Example 10.2.1, we made use of the formula resulting from (ii) of Sect. 10.1 
for determining the second set of canonical variables. In this case, they will be directly 
computed from B to illustrate a different approach. Let us now determine the eigenvectors 
with respect to B, corresponding to Аст) and A2): 


if 4-6 4 
В = Dy ZnZn En = 5 -2 6 -2 |; 
2 0 3 
(B—Aqy Dp = О 
ide» ci £o pe] fo 
=> —3 $ — (24 $43) =5 В |= |0 
2 2 6 2 
3 0 $— (2+ $43) Өз 0 
—(1+ 73) -3 2 fi 0 
= —1 —J3 —1 Bo | = | 0 
1 0 -Q-c-43) LAs 0 
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This yields the equations 


—(1 + 3)fi — 385 + 263 = 0 (iv) 
В — УЗВ — Вз = 0 (у) 
Bi — (2+ V3)B3 = 0, (vi) 


whose solution in terms of an arbitrary £5 is By = —(1 + 4/3)83 and В; = (2 + 4/3). 
Taking Вз = 1, we have the solution, 63 = 1, В = —(1+ V3), В = (2+ V3). Let us 
normalize the resulting vector via the constraint В’ X558 = 1: 


11 0][ 2443 
B'E»B-[2-43 -ü-43) 1]|1 2 0| | -ü 9 3) 
00 1 1 
=6+ 24/3 = $]. 
Thus, the normalized В, denoted by (у), is 
i 24 43 
Ва ped) |= 
Vài 1 
1 
v; = 12+ V3)yi — (1 + V3)y2 + уз]. (vii) 
TI 
Observe that we could also have utilized (iii) of Sect. 10.3.1 to evaluate #(1у and f) 
from o(1) and ос). Consider the second eigenvalue А) = $ — 23 and the equation 
(B — àI) = О, that is, 
4 6 2 6 4 
же 6 6 22 3 " : 
-3 ar 5, Pa) =) 0) = 
3 0 3— (3 — 3v3)] Ls 
-1+/3 -3 2 £i 0 
1 0 —2+/3]| LAs 0 
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which leads to the equations 


(—1+ V3)B1 — 385 + 283 = 0 (viii) 
—fi + У38› — Вз =0 (ix) 
В. + (—2 + МЗ)Вз = 0. (x) 


Thus, when 3 = 1, В: = 2— V3 and fy = 4/3 — 1. Subject to the constraint 6’ X55 = 1, 
we have 


11 0][2- 43 
B'ZnB-[Q- 43) (3-10 1]|1 2 0| 3-1 

0.0 1 1 

= 6 — 2403 = $). 

Hence, the normalized is 
| [2-3 
-—-148-1|2 
Во) J5 С 
=” = Tele VI 5 ~1)y + Уз]. (xi) 


The reader may also verify that this solution for (оу is identical to that coming from (iii) 
of Sect. 10.3.1. Thus, the canonical pairs are the following: From (ii) and (vii), we have 
the first canonical pair (u1, v1), the second pair (u2, v2) resulting from (iii) and (xi). This 
means u is the best predictor of v, and vice versa, and that u» is the second best predictor 
of v2 and vice versa. 


Let us ensure that no computational errors have been committed. Consider 


24/3 


1 1 1 1 
ne) = ——[1, oq | | EE 3 
Q1) 12B(1) 7151, [ ( )] 1 —1 1 ( т 4/3) 


1 
= ————4(3 + 2/3), 
МУ1д1 ) 


with 
у181 = 602 + V3)23 + V3) = 129 + 54/3), 
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so that 


[осту Zbl 163 + 24/3? 


yiói 12096 54/3) 
2016021 +123) 4(1- 43). 40 +. AJ/3)(9 — 5/3) 
12(9 + 54/3) 9 + 34/3) 6 
2 2 2 
= 30 + МЗ) =2+ WZ =2 + УЗ = i): the largest eigenvalue of A, 


which corroborates the results obtained for ос), Во) and Л). Similarly, it can be verified 
that оо), £(2; and àq) have been correctly computed. 


10.4. The Sampling Distribution of the Canonical Correlation Matrix 


Consider a simple random sample of size n from Z = » . Let the (p -- q) x (p 4-q) 


sample sum of products matrix be denoted by S and let Z have a real (p + q)-variate 
standard Gaussian density. Then S has a real (p 4- q)-variate Wishart distribution with the 
identity matrix as its parameter matrix and m — n — 1 degrees of freedom, n being the 
sample size. Letting the density of S be denoted by f (S), 


m  ptq-l 
91272 
fis) n. e 3G S> 0, m> p+q. (10.4.1) 
Dp) 
Let us partition S as follows: 
= b | ; 511 is p x p, $»isq xq, 


and let d$ = 45 ^ 1522 ^ 4512. Note that tr(S) = tr(S11) + tr(S22) and 
[S| = [S22] [S11 = $12555 S21 


zal sok 
= [S2] ISl  — $1, 515575 S21 S777 |. 
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—1 uu p 
Letting U = 512 51252], dU = [511172 15221724512 for fixed 511 and S22, so that the 
joint density of $11, S22 and S12 is given by 


fi(S)dS11 ^ 152 ^ 4512; = р — ——dS1 
2? D») 
m qtl 1 
152212 2 e 222) 
zm 1522 
22 ГУ(5) 
Г a Г, m m p = 
TO) r uyy- au — 1042) 
Гр+д C7) 


It is seen from (10.4.2) that $11, S22 and U are mutually independently distributed, and 
so are 511, S22 and W = UU". Further, S11 ~ Wp(m, I) and S22 ~ W, (m, I). Note that 


aut —1l 
W = UU' = Su 5055! 52181 is ће sample canonical correlation matrix. It follows 


from Theorem 4.2.3 of Chapter 4, that for p < g and U of full rank p, 


dU = урау. (10.4.3) 
Ip) 


After integrating out $1; and S22 from (10.4.2) and substituting for 4512, we obtain the 
following representation of the density of W: 


Pq 


pw) = ШО y sy yt 
2 = т = А 
Г,() Гр+а(5) 
where 
PAS 5; PES Гг(% — #1) 1 
GAZ) _ 2 2 па c | 

Dp) p г(") a re = Pg) л? 0 

Hence, the density of W is 
Dy т— 
pW) = D. ET iwis- wy (104.4) 
plo)! pX 3 


Thus, the following result: 


Theorem 10.4.1. Let Z, S, 511, $22, 52, U and W be as defined above. Then, for 
p € q and U of full rank p, the p x p canonical correlation matrix W = UU' has the 
real matrix-variate type-1 beta density with the parameters (<, T1) that is specified in 
(10.4.4) with m > p -- q, т=п – 1. 
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When q<p and S21 is of full rank q, the canonical correlation matrix Й = U'U = 


552 S218. 512552 will have the density given in (10.4.4) with р апа q interchanged. 
Suppose that p < q and we would like to consider the density of М = U'U. In this case 
U'U is real positive semi-definite as the rank of С is p < q. However, on expanding the 
following determinant in two different ways: 


Ij; U 
U' I, 
it follows from (10.4.2) that the q x q matrix U'U has a distribution that is equivalent to 
that of the p x p matrix UU’, as given in (10.4.4). The distribution of the sample canonical 
correlation matrix has been derived in Mathai (1981) for a Gaussian population under the 
assumption that X12 Æ О. 


= |I, — UU"| = |1, — U'U|, 


10.4.1. The joint density of the eigenvalues and eigenvectors 


Without any loss of generality, let p « q and U be of full rank p. Let W denote the 
sample canonical correlation matrix whose density is as given in (10.4.4) for the case when 
the population canonical matrix is a null matrix. Let the eigenvalues of W be distinct and 
such that 1 > vy > v? > +--+ > vp > 0. Observe that v; = г, where r(j), j = 1,...,р 
аге the sample canonical correlations. For a unique р x p orthonormal matrix Q, QQ! = 
I, Q'Q = I, we have Q'W Q = diag(vi, ..., vp) = D. Consider the transformation from 
W to D and the normalized eigenvectors of W, which constitute the columns of Q. Then, 
as is explained in Theorem 8.2.1 or Theorem 4.4 of Mathai (1997), 


W= е = vj) ap A h(Q) (10.4.5) 


i<j 


where h(Q) = A[(dQ) ОЛ is the differential element associated with О, and we have the 
following result: 


Theorem 10.4.2. The joint density of the distinct eigenvalues 1 > vy > v2 > ++: > 
vp > 0, p € q, of W = UU' whose density is specified in (10.4.4), О being assumed to 
be of full rank p, and the normalized eigenvectors corresponding to vı, ..., vp, denoted 
by f3(D, Q), is the following: 


f3(D, QdDA^h(Q) = Ей E Ш EY T= ape «4n 
р 


x По Јар ano) (10.4.6) 


i<j 
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where h(Q) is as defined in (10.4.5). To obtain the joint density of the squares of the 
sample canonical correlations г and the corresponding canonical vectors, it suffices to 


replace vj Бугу, fe I >0, =l <r <l, j=1,..., p. 


The joint density of the eigenvalues can be determined by integrating out h(Q) from 
(10.4.6) in this real case. It follows from Theorem 4.2.2 that 


N 


p 


| h(Q) = а (10.4.7) 
Op I55) 


this result being also stated in Mathai (1997). For the complex case, the expression on the 
right-hand side of (10.4.7) is zz" U -U/ (p). Hence, the joint density of the eigenvalues 
or, equivalently, the density of D and the density of Q are the following: 


Theorem 10.4.3. When p < q and U is of full rank p, the joint density of the distinct 
eigenvalues 1 > vi > +++ > vp > Oof the canonical correlation matrix W in (10.4.4), 
which is available from (10.4.6) and denoted by f4(D), is 


2 


Г, (4) oo q_ p+! 
ру = P\2 2 2 
mew MOMs Г,(5) | П E | 


j=l 
x [Па vy? TI — vj), (10.4.8) 


and the joint density of the normalized eigenvectors associated with W, denoted by fs(Q), 
is given by 


P 
T» 02) (оу (10.4.9) 


pr 
2 


fs(Q) = 


л 
where h(Q) is as defined їп (10.4.5). 


To obtain the joint density of the squares of the sample canonical correlations roy one 
should replace vj by гуу, j = 1,2,..., p, in (10.4.8). 


Example 10.4.1. Verify that (10.4.8) is a density for p = 2, m — q = p + 1, with q 
being a free parameter. 
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Solution 10.4.1. Form—gq=p+1, p = 2, the right-hand side of (10.4.8) becomes 


2 
tpi р? 
T(=) x? 4—(р+1) 


(пр) E? y, — ») 
TONEDE dii 


СОЛ Т. е © 
= Viv Vj — v5). 1 
гуа) 7 07 07 


The constant part simplifies as follows: 


no. ge rre m m 
= 5 Ul 
MOM) nO varorGQbvreraivzrore 
now, noting that 
3 1 —1 —1 2 
к е к шш eon 
2 2 2 2 2 2 \2 
and substituting these values in (ii), the constant part becomes 
YE sx? _ @-Vagth um 
утууут улут 4 | 
Let us show that the total integral equals 1. The integral part is the following: 
1 VI 4—3 
J J (v1v2) 2 (vı — v2)dvi A dv2 
vj =0 J 12=0 
1 4—1 V1 4—3 1 4—3 V1 4—1 
= | v’ pi i^ dv; dvi — Í v’ pi v’ dv; dvi 
0 v2=0 0 v2=0 
[ va [ to 1 1 
= ——dw — —— dy = —— — 
о (97) o ($2) 405) a) 
4 
(iv) 


~ (q— Dq(q- 1) 


The product of (iii) and (iv) being equal to 1, this verifies that (10.4.8) is a density for 
т —q-p-l, p = 2. This completes the computations. 
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10.4.2. Testing whether the population canonical correlations equal zero 


In its symmetric form, whenever X12 = О, the population canonical correlation matrix 
1 


1 
is a null matrix, that is, Ea рз Фе, X21 X44? = О. Thus, when 22 = О, the canonical 
correlations are equal to zero and vice versa. As was explained in Sect. 6.8.2, we have a 
one-to-one function of u4, the likelihood ratio criterion for testing this hypothesis in the 
case of a Gaussian distributed population. It was established that 


Н ао ) Taz 
u4 = ————— = |Z — S,,* 812855 So S,,7| = |I — UU'| = |I — W| = (1 — rîn) 
Sal al п 512822 521514 П (i 
(10.4.10) 
where гу), j = 1,..., р, are the sample canonical correlations. It can also be seen from 


(10.4.2) that, when U is of full rank p, U has a rectangular matrix-variate type-1 beta 
distribution and W = UU’ has a real matrix-variate type-1 beta distribution. Since it has 
been determined in Sect. 6.8.2, that under H,, the h-th moment of ид for an arbitrary Л is 
given by 

+ m ј-1 
Ip atre 


a ы) 


ЕНД] = с ‚т=п 1, (10.4.11) 
4 


where л is the sample size and c is such that E [491 Ho] — ], the density of u4 is expressible 
in terms of a G-function. It was also shown in the same section that —n In u4 is asymp- 
totically distributed as а real chisquare random variable having ‘2 +00 a) 2 (eh) — 


4а) = pq degrees of freedom, which corresponds to the number of parameters ге- 
stricted by the hypothesis 2,» = О since there are p q free parameters in X12. Thus, the 
following result: 


Theorem 10.4.4. Consider the hypothesis Ho: ро) = +++ = рр) = 9, that is, the popu- 
lation canonical correlations рз), j = 1,..., p, are all equal to zero, which is equivalent 
to the hypothesis Ho : X1» = О. Let ид denote the (2/n)-th root of the likelihood ratio 
criterion for testing this hypothesis. Then, as the sample size n — оо, under Ho, 
—nlnu4 = —2 In(the likelihood ratio criterion) > d | (10.4.12) 


x denoting a real chisquare random variable having v degrees of freedom. 


An illustrative numerical example has already been presented in Chap. 6. 
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Note 10.1. We have initially assumed that X > О, 2j; > O and 222 > О. However, 
X = X}; may or may not be of full rank or some of its elements could be equal to 
zero. Note that X12 2. 271 is either positive definite or positive semi-definite. Whenever 
p < q and Xj is of rank p, 241555 X21 > О and, in this instance, all the p canonical 
correlations are positive. If X12 is not of full rank, then some of the eigenvalues of W as 
previously defined, as well as the corresponding canonical correlations will be equal to 
zero and, in the event that д < p, similar statements would hold with respect to X21, W’ 
and the resulting canonical correlations. This aspect will not be further investigated from 
an inferential standpoint. 


Note 10.2. Consider the regression of X on Y, that is, E[X|Y], when Z = B has the 


Y 
following real (p + q)-variate normal distribution: 


Z ~ Мм E), E > 0, ХУ = E | 


X21 X» 
Xj; = Cov(X) > O is p x p, X22 = Cov(Y) > O isq xq. 


Then, from equation (3.3.5), we have 
E[X|Y] = way + Zi25 (Y — uo) 
where u’ = (шау Ky) and 
Cov(X|Y) = Lu — Zi Zl Za. 


Regression analysis is performed on the conditional space where Y is either composed of 
non-random real scalar variables or given values of real scalar random variables, whereas 
canonical correlation analysis is carried out in the entire space of Z. Clearly, these tech- 
niques involve distinct approaches. When Y is given values of random variables, then 
X1» and 2722 can make sense. In this instance, the hypothesis H5: 21» = О, in which 
case the regression coefficient matrix is a null matrix or, equivalently, the hypothesis 
that Y does not contribute to predicting X, implies that the canonical correlation matrix 


—1 = 
Е A xb 2513 Ta is as well a null matrix. Accordingly, in this case, the ‘no regres- 
sion’ hypothesis 2/12 = О (no contribution of Y in predicting X) is equivalent to the 
hypothesis that the canonical correlations are equal to zero and vice versa. 


10.5. The General Sampling Distribution of the Canonical Correlation Matrix 


Let the (p + q) x 1 real vector random variable Z = ~ Np ll, X), X > О. 


X 
Y 
Consider a simple random sample of size n from this Gaussian population and let the 
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sample sum of products matrix S be partitioned as in the preceding section. Let the sample 
canonical correlation matrix be denoted by R and the corresponding population canonical 
correlation, by P, that is, 


—1 —1 —1 l 
R—84505;515, and P = X Zo Zj ZaZq. 


We now examine the distribution of R, assuming that P Z O. Letting the determinant of 
I — R be denoted by u, we have 


= = [S] 
u = |I — R| = |5 — S283 SallSul | = ————. 
m [Sir] 1522] 
Thus, the h-th moment of и is 
[S| 
[S11] |S22| 


Since S, S11 and 52 are functions of S, we can integrate out over the density of S, namely 
the Wishart density with m = n — 1 degrees of freedom and parameter matrix X > О. 
Then for m > p +9, 


h m => 
aie / = |js|3- EF ,.-3"07Sas (10.5.1) 
212 Гь+4(®) Js>o -1S11l 15221 


һ 
Еш']= E| | = EUSP Sa S217]. 


Let us substitute 5 to 15 so that 2 will vanish from the factors containing 2, and let us 
replace |$11| 7 and |S22|~” by equivalent integrals: 


1 pti Р == 1 
Sul^- Xy Fe MW ауу, Rih | 
[S11] T h) a il е 1, (Л) > 5 
1 а+1 q—1 
RY —h 25 y. h—*3- —tr(Yo5»2)a y. ‚ Я h ee 
|S22| Fh) -€— 2| e 2, Hh) > 5 
Then, 
E[u^] J / ГЕ et |s| t- 
SS a a AT 1 2 
1212 Гр. 05) Гь) ГАР) Jy, 0 Jo $20 
x e UG $*YSasAdy;AdY, (10.5.2) 
where 


_ Yi O\) (Siu Si2\) | (X1 О 
mors) =u (9 A > S) J = този) vas. е Y) 
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Integrating over S in (10.5.2) gives 


ооа асаа ИШИН БИШЕ J J y F nF 
Гр+а С) IET (Л) Jy» 0 JY;»0 


x |X! + Y| CAY] лар. 


Let 


ЕЕ ЕУ ze | 


5721 722 + Yo 


Then, the determinant can be expanded as follows: 


=|5"4Y¥||¥, + B|, B = DH! — DD? + Y) x, 


so that 


IS y| G9 = (pe ав ту He B TEH, 
Collecting the factors containing Y; and integrating out, we have 
DG) 


е -1y )-($+h) Р\2 h o pad 
|Ү| 2 [2+ Bly, G*tPay, = —L72- _| BI", Rh) > ——, 
I, (A) Yi-0 Гу(5 +h) 2 


and |B|~(2+")| B|^ = |B|~7. Noting that 


11 yl 
E X? +y — 


ja) |D? + Y; ИИ Ss) 
JE? + Yj| |z! - z (7 + Voy NE | = |Z7 + Vol B], 


|B| can be expressed in the following form: 


iz" |Y2 + CI И ! 
| В| = 2 Y : C= y? = E xh 1-12 _ rx (i) 
so that, 
18172 = jx" 31, + 5517210 + ET 5 


|B|^*]y 4 £29 ay рр ьи, (ii) 
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Collecting all the factors containing Үә and integrating out, we have the following: 
1 
Г4(Лһ) Jy, o 
NP 
Г) 


+1 т 
и 1v, + Syl Fy + ®??|-"ау»› 


iy S 1r 4 ziv ipt 
xus 2 2222—09 
2 


1 l 1 ї 
x |55055 + УУ") 
PE _ atl om 1 ios i 1 
~ Га(ћ) Ју>о И TU + 1 "У, W = 05, 
(iii) 


+1 


1 1 
as W = 55025, > dř = |Z53]- 2 dW. Now, letting W = UT! — I, so that dW = 
jU|-9@+Dau with O < U < І, the expression in (iii), denoted by 6, becomes 
E PME m q+! 


= iu|$- — UF I — AU dU 
Гу (h) O<U<I 


1 1 m m т 
where А = I — 52, X? X2. Note that since |X|? = |37))|2|5ү — LVy2(L22) 12112 = 


| 2:23]? Pl eee E in the denominator of the constant part gets canceled out, the re- 
maining constant expression being 
Potala +%) DG) 
Dp) ГЬ ER) 
The integral part of ó can be evaluated by making use of Euler's representation of a Gauss’ 
hypergeometric function of matrix argument, which as given in formula (5.2.15) of Mathai 
(1997), is 
Ty (a)Ig(c — a) 
Га (с) 


(iv) 


oF (a, b; c; X) -| 121—8 |r — 21-9 |1 — XZ|-PdZ 
O-cZ-I 


where O < Z < I and О < X < I аге д x q real matrices. Thus, д can be expressed as 
the follows: 


Г.С) Г, (№ 1 1 
ы A) BT 2+1 5,5” ху), 
FU) 42" 
so that 
ju qum qa Т hod 
E[,^] = у d [e h; Th 1— 55? IL). 0053) 
Pag UE qu node 
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For р < q, it follows from the definition of the matrix-variate gamma function that 
Гр+а(о) = m Pu Гу (a) Г (о — q/2). Thus, the constant part in (10.5.3) simplifies to 
rG-$em non 
T-P ne» 


Then, 


E[u^] = 


D, +h r, 1 1 
" = | рт) (S hZ Ie ZLIUIYL) (10.5.4) 
DEO) TO +h) 2 


form > p+q, Rh) > —5 + p + 5, p < д. Had Y» been integrated out first instead 


1 i 
of Yi, we would have ended up with a hypergeometric function having J — Xj, X ИР а$ 
its argument, that is, 


Оов 


1 1 
E[u^] = . +h; I- SAI" xj) 10.5.5 
[u ] ne D +h) 271 2 11 11 ( ) 


10.5.1. The sampling distribution of the multiple correlation coefficient 


When р = landg > 1, rid is equal to the square of the sample multiple correla- 
| | , 1...) | | 
tion coefficient r1(1...q). In this case, the argument in (10.5.5) is a real scalar quantity that 
is equal to 1 — c110 |}, the real matrix-variate Г, p C) functions are simply T (-) functions 
andu = | — Pacar Letting y = ДО, E[1 — y]" is available from (10.5.5) for p = 1 
and the argument of the 2 Fı hypergeometric function is then 


X53, 521 


an. (10.5.6) 
о — 212275) 221 


1—oj0!! = 1 — 011 (011 — Viele 5)! =— 


By taking the inverse Mellin transform of (10.5.5) for h = s—1 and p = 1, we can express 
the density f(y) of the square of the sample multiple correlation as follows: 


Ча кш К+ m m q ә 
ХО) = TeSDre ? l1 – DEN “LAS э? » p >) (10.5.7) 


where р? is the population multiple correlation squared, that is, o? = 
[ 2; Xo 221]/o11. We can verify the result by computing the h-th moment of 1 — y 
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in (10.5.7). The -th moment can be determined as follows by expanding the 2 А function 
and then integrating: 


| @—2)#%Г(®) 3 (BCP (02)* 


Е[1— у]! = к 
COM = тог) & d» М 


1 
x | кыа – у)" +814, 
0 
the integral part being 


PEtH Cs +h: JUDICIA бу 
Г(@+һ+ю) Pe +h +h)’ 


so that 


Г(®# +h) PG) к\(® m m 


E[1 — h — (1 — p2)? , , 
ау -ü-e) reo POLUM A292 


+h: p. (10.5.8) 


On applying the relationship, 


2Fi(@, b; c; 2) = 0 — 97 fie — a, b; с; ——), (10.5.9) 


z—1 


we have 


DIC ШЕЕ Э 2) =a- eS Fi(h ут ama ) 
eth 1t Qp un p 21] tay И , 


with 
р? Уз 95у Za 


p*-l 0j = Xu Xn 


which agrees with (10.5.6). Observe that (1 — "o gets canceled out so that (10.5.8) 
agrees with (10.5.5) for p — 1. 


We can also obtain a representation of the density of the sample canonical correlation 
matrix whose M-transform is as given in (10.5.5) for p < q. This can be achieved by 
duplicating the steps utilized for the particular case considered in this section, which yields 
the following density: 
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|I — PI? Г(") 


q_ p+! m—-q _ p+! 
(R) = — RI 7 MI — R| ZT 
f PED. | | | 
m mq i ıı 
x (s. 25 P?RP?) (10.5.10) 


1 1 

where Р = X ‘a X i3 Xa X a is the population canonical correlation matrix. Note 
that a function giving rise to a certain M-transform need not be unique. However, by mak- 
ing use of the Laplace transform and its inverse in the real matrix-variate case, Mathai 
(1981) has shown that the function specified in (10.5.10) is actually the unique density 


of R. 


Exercises 10 


10.1. In Example 10.3.2, verify that 


[o5 Zibo 


= 2 
y202 


where A» is the second largest eigenvalue of the canonical correlation matrix A. 


10.2. In Example 10.3.2, use equation (10.1.1) or equation (ii) preceding it with o; — 
p2 = p and evaluate q) and bo) from q) and оо). Obtain P first, normalize it subject 
to the constraint 8' 27228 = 1 and then obtain Вет) and Вә). Then verify the results 


[v ifi) А ай [ау ixi) Е 


2 
y1ó1 y202 


where A, and Az are the largest and second largest eigenvalues of the canonical correlation 
matrix A. 


10.3. Let 


Хз 


2 1 0 
Cov(Y) = Xn = Ё Ir Cov(X,Y)—2|-1 1], 
1 0 


1 3 
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where x1, х2, х3, уу, yo are real scalar random variables. Evaluate the following where 
the notations of this chapter are utilized: (1): The canonical correlations oj) and 
002); (2) The first pair of canonical variables (u1, vi) by direct evaluation as done in 
Example 10.3.2; (3): Verify that 


[Bi Хото]? 


= А: the largest eigenvalue of В 
y151 


—1 —1 

where В = Ey Xa X mu jS (4): Evaluate the second pair of canonical variables 
(u2, v2) by using equation (10.1.1) for constructing от) and осо) after obtaining (уу and 
Во); (5): Verify that 

[80 Хао) 

үдә 

10.4. Repeat Problem 10.3 with X, Y and their associated covariance matrices defined as 
follows: 


= А2: the second largest eigenvalue of B. 


ХІ yı 2 0 0 
Х= |х|, У = |у |, CoviXy)=2j,= 1/0 2 2), 
ХЗ Уз 02 3 
22 0 1 1 1 
Cov(Y) = 2552|2 3 0 |, Cov(X,Y)— Xnr = 1 —1 
0 0 2 —1 1 —1 


where x1, хә, Xa, Y1, Y2, уз are real scalar random variables. As well, compute the three 
pairs of canonical variables. 


10.5. Show that the M-transform in (10.5.5) is available from the density specified in 
(10.5.10). 
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Chapter 11 A) 
Factor Analysis C: 


11.1. Introduction 


We will utilize the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital let- 
ters X, Y, ... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of let- 
ters such as X, y, X | Y to denote variables in the complex domain. Constant matrices will 
for instance be denoted by A, В, С. A tilde will not be used on constant matrices unless 
the point is to be stressed that the matrix is in the complex domain. In the real and com- 
plex cases, the determinant of a square matrix A will be denoted by |A| or det(A) and, 
in the complex case, the absolute value or modulus of the determinant of A will be de- 
noted as |det(A)|. When matrices are square, their order will be taken as p x p, unless 
specified otherwise. When A is a full rank matrix in the complex domain, then AA* is 
Hermitian positive definite where an asterisk designates the complex conjugate transpose 
of a matrix. Additionally, dX will indicate the wedge product of all the distinct differen- 
tials of the elements of the matrix X. Thus, letting the p x q matrix X = (xij) where 
the x;;'s are distinct real scalar variables, dX = А л | dxij. For the complex matrix 


Х = Xı +iX2, i = y (1), where X, and X» are real, dX = аху лах». 


Factor analysis is а statistical method aiming to identify a relatively small number 
of underlying (unobserved) factors that could explain certain interdependencies among a 
larger set of observed variables. Factor analysis also proves useful for analyzing causal 
mechanisms. As a statistical technique, Factor Analysis was originally developed in con- 
nection with psychometrics. It has since been utilized in operations research, finance and 
biology, among other disciplines. For instance, a score available on an intelligence test 
will often assess several intellectual faculties and cognitive abilities. It is assumed that a 
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certain linear function of the contributions from these various mental factors is producing 
the final score. Hence, there is a parallel to be made with analysis of variance as well as 
design of experiments and linear regression models. 


11.2. Linear Models from Different Disciplines 


In order to introduce the current topic, we will first examine a linear regression model 
and an experimental design model. 


11.2.1. A linear regression model 


Let x be a real scalar random variable and let f;,..., t, be either r real fixed numbers 
or given values of r real random variables. Let the conditional expectation of x, given 
ti, ...,f,, be of the form 


E[x|ti, Sesto] =а+а o + a, 
or the corresponding model be 
X=aotat+-:-+at+e 


where ao, a1, ..., а, are unknown constants, /,..., f, are given values and e is the error 
component or the sum total of contributions coming from unknown or uncontrolled factors 
plus the experimental error. For example, x might be an inflation index with respect to a 
particular base year, say 2010. In this instance, f£; may be the change or deviation in the 
average price per kilogram of certain staple vegetables from the base year 2010, t? may 
be the change or deviation in the average price of a kilogram of rice compared to the base 
year, t3 may be the change or deviation in the average price of flour per kilogram with 
respect to the base year, and so on, and t, may be the change or deviation in the average 
price of milk per liter compared to the base year 2010. The notation fj, j = 1,...,7, 
is utilized to designate the given values as well as the corresponding random variables. 
Since we are taking deviations from the base values, we may assume without any loss of 
generality that the expected value of t; is zero, that is, E[t;] = 0, j = 1,...,r. We may 
also take the expected value of the error term e to be zero, that is, E[e] = 0. Now, let x; 
be the inflation index, x2 be the caloric intake index per person, x3 be the general health 
index and so on. In all these cases, the same fj, ... , t, can act as the independent variables 
in a regression set up. Thus, in such a situation, a multivariate linear regression model will 
have the following format: 


xi Ш а a wes: ғ | | fi е 


X2 2 a2) а22 ... а 2 е? 
| ечи (11.2.1) 


>< 
Il 
Il 


Xp Шр арі ар ... Apr fr ep 
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and we may write this model as 
X=M+AF+e 


where A = (Ajj) iSap xr, r < p, matrix of full rank r, e is p x 1 and F isr x 1. In 
(11.2.1), му = aij and f; = tj. Then, E[X] = M + AE[F] + E[e] = M since we have 
assumed that E[F] = О (a null matrix) and Е[є] = О. When F and є are uncorrelated, 
the covariance matrix associated with X, denoted by Cov(X) = 27, is the following: 


X = Cov(X) = E((X — M)(X — MY} = E(AF + (ЛЕ + €) 
= A Cov(F) A’ + Cov(e) + О 
= ЛФЛ +Y (11.2.2) 


where the covariance matrices of F апа є are respectively denoted by Ф > О (positive 
definite) and У > О. In the above formulation, F is taken to be a real vector random 
variable. In a simple linear model, the covariance matrix of €, namely W, is usually taken 
as o7I where o? > Ois areal scalar quantity and / is the identity matrix. In a more general 
setting, У can be taken to be a diagonal matrix whose diagonal elements are positive; in 
such a model, the e;’s are uncorrelated and their variances need not be equal. It will be 
assumed that the covariance matrix W in (11.2.2) is a diagonal matrix having positive 
diagonal elements. 


11.2.2. A design of experiment model 


Consider a completely randomized experiment where one set of treatments are under- 
taken. In this instance, the experimental plots are assumed to be fully homogeneous with 
respect to all the known factors of variation that may affect the response. For example, the 
observed value may be the yield of a particular variety of corn grown in an experimental 
plot. Let the set of treatments be r different fertilizers F1, ..., Fp, the effects of these fer- 
tilizers being denoted by о, ..., œ+. If no fertilizer is applied, the yield from a test plot 
need not be zero. Let jz; be a general effect when F; is applied so that we may regard о 
as a deviation from this effect шј due to Р. Let еј be the sum total of the contributions 
coming from all unknown or uncontrolled factors plus the experimental error, if any, when 
F|is applied. Then a simple linear one-way classification model for F is 


Xj = ш +оо ei, 
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with x; representing the yield from the test plot where F; was applied. Then, correspond- 
ing to Fi,..., Fp, r = p we have the following system: 


xX} = ш +g +e 


Xp =Mp tap + ep 


or, in matrix notation, 


X=M+AF+e (11.2.3) 
where 
10... 0 
at A o 0 1 0 
хе |а ер аар |, ad4= . 
Хр ер 2 00... 1 


In this case, the elements of A are dictated by the design itself. If the vector F is fixed, 
we call the model specified by (11.2.3), the fixed effect model, whereas if F is assumed to 
be random, then it is referred to as the random effect model. With a single observation per 
cell, as stated in (11.2.3), we will not be able to estimate the parameters or test hypotheses. 
Thus, the experiment will have to be replicated. So, let the j-th replicated observation 


J . > J errr 


Xpj 
X, Ф, and V remaining the same for each replicate within the random effect model. 
Similarly, for the regression model given in (11.2.1), the j-th replication or repetition 
vector will be X j = (Xij, ---, Xpj) with X, Ф and V therein remaining the same for each 
sample. 


We will consider a general linear model encompassing those specified in (11.2.1) and 
(11.2.3) and carry out a complete analysis that will involve verifying the existence and 
uniqueness of such a model, estimating its parameters and testing various types of hy- 
potheses. The resulting technique is referred to as Factor Analysis. 


11.3. A General Linear Model for Factor Analysis 
Consider the following general linear model: 


X—M-AF e (11.3.1) 
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where 
xi Ш €i 
X = , M = : Є = , 
Хр Ир ёр 
А: AID sac Ал fi 
Agi A22 «+. А>» 
A= | MEN : andF—|:|,r-zp, 
Api Ap2 ... А fr 
with the шу’, A;j's, f;'s being real scalar parameters, the x;'s, j = 1,..., p, being real 


scalar quantities, and A being of dimension p x r, r < p, and of full rank r. When 
considering expected values, variances, covariances, etc., X, F, and € will be assumed to 
be random quantities; however, when dealing with estimates, X will represent a vector of 
observations. This convention will be employed throughout this chapter so as to avoid a 
multiplicity of symbols for the variables and the corresponding observations. 


From a geometrical perspective, the r columns of A, which are linarly independent, 
span an r-dimensional subspace in the p-dimensional Euclidean space. In this case, the 
r x 1 vector F is a point in this r-subspace and this subspace is usually called the factor 
space. Then, right-multiplying the p xr matrix A by a matrix will correspond to employing 
a new set of coordinate axes for the factor space. 


Factor Analysis is a subject dealing with the identification or unique determination of 
a model of the type specified in (11.3.1), as well as the estimation of its parameters and 
the testing of various related hypotheses. The subject matter was originally developed in 
connection with intelligence testing. Suppose that a test is administered to an individual 
to evaluate his/her mathematical skills, spatial perception, language abilities, etc., and that 
the score obtained is recorded. There will be a component in the model representing the 
expected score. If the test is administered to 10th graders belonging to a particular school, 
the grand average of such test scores among all 10th graders across the nation could be 
taken as the expected score. Then, inputs associated to various intellectual faculties or 
combinations thereof will come about. All such factors may be contributing towards the 
observed test score. If fi, ..., fp, are the contributions coming from r factors correspond- 
ing to specific intellectual abilities, then, when a linear model is assumed, a certain linear 
combination of these inputs will constitute the final quantity accounting for the observed 
test score. A test score, х1, may then result from a linear model of the following form: 


xi = Mi tAnfitrArwf+:--+tArh +e 
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with A11,..., A1, being the coefficients of f1,..., fr, where f1,..., fr, are contributions 
from r factors toward х1; these factors may be called the main intellectual factors in this 
case, and the coefficients А, ..., Ал, may be referred to as the factor loadings for these 
main factors. In this context, jz; is the general expected value and e, is the error com- 
ponent or the sum total of contributions originating from all unknown factors plus the 
experimental error, if any. Note that the contributions fj,..., f, due to the main intel- 
lectual factors can vary from individual to individual, and hence it is appropriate to treat 
А, ..., fr as random variables rather than fixed unknown quantities. These fi,..., f, are 
not observable as in the case of the design model in (11.2.3), whereas in the regression 
type model specified by (11.2.1), they may take on the recorded values of the observable 
variables which are called the independent variables. Thus, the model displayed in (11.3.1) 


may be analyzed either by treating ју, ..., / as fixed quantities or as random variables. 
If they are treated as random variables, we can assume that f1,..., f; follow some joint 
distribution. Usually, joint normality is presupposed for f1,..., fr. Since f1,..., f, are 


deviations from the general effect иу due to certain main intellectual faculties under con- 
sideration, it may be assumed that the expected value is a null vector, that is, E[F] = O. 
We will denote the covariance matrix associated with F as Ф: Cov(F) = Ф > О (real 
positive definite). Note that the model's error term e; is always a random variable. Letting 
X1, ..., Xp be the test scores on p individuals, we have the error vector є' = (e1,..., €p). 
Without any loss of generality, we may take the expected value of € as being a null vector, 
that is, Е[є] = О. For a very simple situation, we may assume the covariance matrix as- 
sociated with є to be Соу(є) = o?I where o? > 0 is a real positive scalar quantity and J 
is the identity matrix. For a somewhat more general situation, we may take Соу(є) = V 
where W is a real positive definite diagonal matrix, or a diagonal matrix with positive di- 
agonal elements. In the most general case, we may take V/ to be a real positive definite 
matrix. It will be assumed that W is diagonal with positive diagonal elements in the model 
(11.3.1), and that F and є are uncorrelated. Thus, letting 27 be the covariance matrix of X, 
we have 


У = E[(X — MY(X — My] = ЕЛЕ +€)(AF + €)] 
= ЛЕ(ЕЕ)Л + Eee’) + О 
= AGA +Y (11.3.2) 


where X, Ф and V, with X = ЛФЛ! + V, are all assumed to be real positive definite 
matrices. 
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11.3.1. The identification problem 


Is the model specified by (11.3.1) unique or could it represent different situations? In 
other words, does it make sense as a model as stated? Given any r x r nonsingular matrix 
A, let AF = F* and AA! = A*. Then, A*F* = AA! AF = AF. In other words, 


X=M+AF+e€=M+4+A*F* + є. (11.3.3) 


Consequently, the model in (11.3.1) is not identified, that is, it is not uniquely determined. 


The identification problem can also be stated as follows: Does there exist a real positive 
definite p x p matrix X > О containing p(p + 1)/2 distinct elements, which can be 
uniquely represented as A® A’ + V where A has pr distinct elements, Ф > О has 
r(r + 1)/2 distinct elements and У is a diagonal matrix having p distinct elements? There 
is clearly no such matrices as can be inferred from (11.3.3). Note that an r x r arbitrary 
matrix A represents r? distinct elements. It can be observed from (11.3.3) that we can 
impose r? conditions on the parameters in A, and W. The question could also be posed 
as follows: Can the p(p + 1)/2 distinct elements in X plus the r? elements in А (r? 
conditions) uniquely determine all the elements of A, V and Ф? Let us determine how 
many elements there are in total. A, V and have a total of pr + p 4- r (r + 1) /2 elements 
while A and 27 have a total of r? + p(p + 1)/2 elements. Hence, the difference, denoted 
by à, is 


| p(p t D) 
DESEE - 


r(r 4- 1) 
2 


1 
5 +r- [5+ +р|= 5100-7 -(р+0. 0134 
Observe that the right-hand side of (11.3.2) is not a linear function of A, Ф апа У. Thus, if 
ô > 0, we can anticipate that existence and uniqueness will hold although these properties 
cannot be guaranteed, whereas if ё < 0, then existence can be expected but uniqueness 


may be in question. Given (11.3.2), note that 
У = уф ЛФЛ = T-W= ЛФЛ 


where ЛФ A’ is positive semi-definite of rank r, since ће p x r, r < p, matrix A has full 
rank r and Ф > O (positive definite). Then, the existence question can also be stated as 
follows: Given a p x p real positive definite matrix 27 > О, can we find a diagonal matrix 
V with positive diagonal elements such that X — W is a real positive semi-definite matrix 
of a specified rank r, which is expressible in the form B B' for some p x r matrix B of 
rank r where r < p? For the most part, the available results on this question of existence 
can be found in Anderson (2003) and Anderson and Rubin (1956). /f a set of parameters 
exist and if the model is uniquely determined, then we say that the model is identified, or 
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alternatively, identifiable. The concept of identification or identifiability within the context 
of Factor Analysis has been studied by Ihara and Kano (1995), Wegge (1996), Allman et al. 
(2009) and Chen et al. (2020), among others. 


Assuming that Ф = I will impose r(r + 1)/2 conditions. However, r? = rth) + 
=. Thus, ме may impose r(r — 1) /2 additional conditions after requiring that Ф = 7. 
Observe that when Ф = I, ЛФЛ" = A*A™ = AA-1A' Л, and if this is equal to 
AA’ assuming that Ф = Г, this means that (A’A)~! = J or A'A = 1, that is, A is an 
orthonormal matrix. Thus, under the condition Ф = /, the arbitrary r х r matrix A be- 
comes an orthonormal matrix. In this case, the transformation Y = AA is an orthonormal 
transformation or a rotation of the coordinate axes. The following r x r symmetric matrix 
of r(r + 1)/2 distinct elements 

A-— Aw A (11.3.5) 


is needed for solving estimation and hypothesis testing problems; accordingly, we can im- 
pose r(r — 1)/2 conditions by requiring A to be diagonal with distinct diagonal elements, 
that is, A = diag(yi, ..., m), nj > 0, j = 1,..., к. This imposes HD -p= ee) 
conditions. Thus, for the model to be identifiable or for all the parameters in A, Ф, v 
to be uniquely determined, we can impose the condition Ф = І and require that 
A = A'V-!A be diagonal with positive diagonal elements. These two conditions will 
provide roD + r=) = r? restrictions on the model which will then be identified. 


When Ф = I, the main factors are orthogonal. If Ф is a diagonal matrix (including 
the identity matrix), the covariances are zeros and it is an orthogonal situation, in which 
case we say that the main factors are orthogonal. If Ф is not diagonal, then we say that the 
main factors are oblique. 


One can also impose r(r — 1)/2 conditions on the p x r matrix A. Consider the first 
r x r block, that is, the leading r x r submatrix or the upper r x r block in the p x r 
matrix, which we will denote by B. Imposing the condition that this r x r block B is 
lower triangular will result in r? — rex m rr D conditions. Hence, Ф = I and the 
condition that this leading r x r block B is lower triangular will guarantee r? restrictions, 
and the model will then be identified. One can also take a preselected r x r matrix B, and 
then impose the condition that B; B be lower triangular. This will, as well, produce == 
conditions. Thus, Ф = Гапа B, В being lower triangular will ensure the identification of 
the model. 


When we impose conditions on Ф and W, the unknown covariance matrices must as- 
sume certain formats. Such conditions can be justified. However, could conditions be put 
on A, the factor loadings? Letting the first r x r block B in the p x r matrix A be lower 
triangular is tantamount to assuming that 512 = 0 = A13 =--- = Aj; or, equivalently, that 
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Р, ..., fr do not contribute to the model for determining x; in X’ = (x1, xo, ..., Xp). 
Such restrictions are justified if we can design the experiment in such a way that x, de- 
pends on fı alone and not on f5,..., fr. In psychological tests, it is possible to design 
tests in such a way that only certain main factors affect the scores. Thus, in such instances, 
we are justified to utilize a triangular format such that, in general, there are no contri- 
butions from fj+1,..., fr, toward х; or, equivalently, the factor loadings А; j41,..., Air 
equal zero fori = 1,...,r — 1. For example, suppose that the first r tests are designed 
in such a way that only fi,..., f; and no other factors contribute to x; or, equivalently, 
Xi = Bj + An fid Afi +e, i = 1,...,7r. We can also measure the contribution 
from f; in А;; units or we can take A;; = 1. By taking B = J, we can impose r? conditions 
without requiring that Ф = 7. This means that the first r tests are specifically designed so 
that x; only has a one unit contribution from fi, x2 only has a one unit contribution from 
Р, and so on, x, receiving a one unit contribution from f,. When B is taken to be diago- 
nal, the factor loadings are A11, A22,..., Ayr, respectively, so that only f; contributes to 
x; for i = 1,..., r. Accordingly, the following are certain model identification conditions: 


(1: Ф = I and A'V-14A is diagonal with distinct diagonal elements; 

(2): Ф = І and the leading r x r submatrix В in the p x r matrix A is triangular; 
(3): Ф = I and В| В is lower triangular where В| is a preselected matrix; 

(4): The leading r x r submatrix B in the p x r matrix A is an identity matrix. 


Observe that when r — p, condition (4) corresponds to the design model considered in 
(11.2.3). 


11.3.2. Scaling or units of measurement 


A shortcoming of any analysis being based on a covariance matrix 27 is that the co- 
variances depend on the units of measurement of the individual variables. Thus, mod- 
ifying the units will affect the covariances. If we let y; and y; be two real scalar ran- 
dom variables with variances о; and oj; and associated covariance о;у, the effect of 
scaling or changes in the measurement units may be eliminated by considering the vari- 
ables z; = yj/,/ojj and z; = yj/,/0j; whose covariance Cov(z;, 23) = rij is actually 
the correlation between y; and yj, which is free of the units of measurement. Letting 

Kus — od 1 1 : EE 
Ү = (yr, .... yp) and D = diag( Nee NS consider Z = DY. We note that 
Cov(Y) = X => Cov(Z) = РУР = R which is the correlation matrix associated 
with У. 
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In psychological testing situations or referring to the model (11.3.1), when a test score х; 
is multiplied by a scalar quantity c;, then the factor loadings А у, ..., Ajr, the error term 
e; and the general effect и; are all multiplied by су, thatis, суху = cjujtcjAji fid: 
Air fr) + cjej. Let Cov(X) = X = (cij), that is, Cov(x;, xj) = Ojj, ОСЕТЕ ИЯ хр) 


апа D = diag( L ТТТ 725): Consider the model 


DX = DM + DAF + De = Cov(DX) = DSD = рлФЛ'р + рур. (11.3.6) 


If X* = DX, M* = DM, A* = DA, and e* = De, then we obtain the following model 
and the resulting covariance matrix: 


X* = M* + A*F + є* = X* = Cov(X*) = A*Cov(F)A* + v* 
= DSD = рАФЛ'р + DYD 
= R= A* A* + \Ч* (11.3.7) 


where А = (rjj) is the correlation matrix in X. An interesting point to be noted is that the 
identification conditions Ф = J and A*'W*-! A* being diagonal become the following: 
Ф = I and A*w*-! A* = A'DD-v-! D-! DA = A'W! A which is diagonal, that is, 
A'V-14 is invariant under scaling transformations on the model or under X* = DX and 
w* = Dv D. 


11.4. Maximum Likelihood Estimators of the Parameters 


A simple random sample of size n from the model X = M + AF + e specified in 
(11.3.1) is understood to be constituted of independently and identically distributed (па) 
Хур, j = l,...,n, where 


X1j elj 

X M ; X2j e2j 
j= +АЁЕ+Е),]=1,...‚,п, Xj= ; QEj- ; А (11.4.1) 

Xpj €pj 


and the X;'s are iid. Let f; and E; be independently normally distributed. Let E; ~ 
№(0, W) and X; ~ Ny(u, X), Y = ЛФЛ + V where Ф = Cov(F) > О, v = 
Соу(є) > О and X > О, V being a diagonal matrix with positive diagonal elements. 
Then, the likelihood function is the following: 


: 1 
f= e 30G0-M)y 5 X;-M) 


ja Ол) 152 


= Fg Df MY ENG (11.4.2) 
(277) 2 [|2 
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The sample matrix will be denoted by a boldface X = (Х|, ..., Xn). Let J be then x 1 


vector of unities, that is, J = (1, 1,..., 1)’. Then, 
X11 X12 ... Xin 


Хор X22 ... Xn 


Хр1 Xp2 + Хрп 


п X 
1 1л) _ 


7—1 Xpj) 


<1 


Sl, 
"s 


where X is the sample average vector or the sample mean vector. Let the boldface X be 
the p x n matrix X — (X, X,..., X). Then, 


(Х — X)(K – Х) = 5 = (sij), sij = У Gu — й) — £j). (11.4.3) 
k=1 


where S is the sample sum of products matrix or the “corrected” sample sum of squares 
and cross products matrix. Note that 


Thus, 


1 / 1 / : / 1 / / 
§=x(1--JJ')(1--Js')x =х(1—--//)Х. (11.4.4) 
п п п 
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Since (X; — MY X =x у = М) is areal scalar quantity, we have the following: 


У(Х, -Myx lx;-M)- uo; — Му Xj — М) 
j=l j=l 


= у up "GG — M)(X; — My] 
j=l 

= u[E У(Х; —-X-X—M)Xj —X- X — My] 
j=l 

= tr[ X7! x; — XYX; — Xy] 
j=l 

t ntrLZ-! (X — M)(X — My] 
= (E18) + n(X — MY E-! (X — M). (11.4.5) 


Hence, _ _ 
= (x) Теш De М). (11.4.6) 


Differentiating (11.4.6) with respect to M, equating the result to a null vector and solving, 
we obtain an estimator for M, denoted by M,as M — X. Then, In L evaluated at M — X 
is 


In L 2 —"P шол) — |р Пе!) 
nL = – = (2л) — =In|2| — «tr 
2 2 2 


_ AP n / | / -1 
= =з InOm) — 3m ЛФЛ + v|-— sas + Ш) S] (11.4.7) 


11.4.1. Maximum likelihood estimators under identification conditions 


The derivations in the following sections parallel to those found in Mathai (2021). One 
of the conditions for identification of the model is Ф = J and A’W~' A being a diagonal 
matrix with positive diagonal elements. We will examine the maximum likelihood estima- 
tors (MLE)/maximum likelihood estimates (MLE) under this identification condition. In 
this case, it follows from (11.4.7) that 


_ пр п / 1 / -1 
In L = == men) — Ая + Ш| – ла +Y) S]. (11.4.8) 
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By expanding the following determinant in two different ways by applying properties of 
partitioned determinants that are stated in Sect. 1.3, we have the following identities: 


b =A 


4i f = III + ACTA = w+ AAT (11.4.9) 


Hence, letting 


А = A'WT!A = diag(610;, 6585, . . .,0/8,), 


r 


we have 


In [AA  4- v| «In [Ф| - In|] ЛА | 


p r 
=) Inv + У Ind 385), (11.4.10) 
j=l j=l 


where 87 = A75, Aj is the j-th column of A апа y;j, j = 1,..., р, is the j- 
th diagonal element of the diagonal matrix V, the identification condition being that 
Ф = I and A'W'!A = A = diag(5,61,..., 6/6,). Accordingly, if we can write 
tr(Z 715) = tr[(AA’ + V) 1S] in terms of jj, j = 1,..., р, and 88), j—l..ar 
then the likelihood equation сап be directly evaluated from (11.4.8) and (11.4.10), and the 
estimators can be determined. The following result will be helpful in this connection. 


Theorem 11.4.1. Whenever AA’ + V is nonsingular, which in this case, means real 
positive definite, the inverse is given by 


(AA + Ш) = ш pTLA + D lay! (11.4.11) 
where the A is defined in (11.4.10). 


It can be readily verified that pre and post multiplications of WT! — w-!A(A + 
1) ЛШ by AA’ + V yield the identity matrix /,. 


11.4.2. Simplifications of | X| and tr( Z ^! S) 
In light of (11.4.9) and (11.4.10), we have 


15| = 1AA' + Y| = | |A'V- A + I 


= [Ф| 1+ Д| = | П wM Па 595p]. 
j=l j=l 
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Now, observe the following: In Л(А+ 7) = Л ав (туу, —€— DER the j-th column 
1 D" 


of A is multiplied by ү, j = 1,..., and 
Jj 


r 1 
A(A+ 1D 1A! = у —__ Aj’, 


Ne J*^J 
oy 1+ 858) 


where Л ; is the j-th column of A and the 4;’s are specified in (11.4.10). Thus, 


р ГА 
In|Z| 2 9 Iny; + У 101-878) (11.4.12) 
j=l j=l 


and, on applying Theorem 11.4.1, 


(571$) = (ЛЛ + Ш) 18] =e(w 7 8)] — [wot AA + D^! Av! S] 
= =1 су ч EE MU —1 —1 
= (0715) 2. EFA tr(Aj AW! S 7!) 


1 
= 100715) – у, Tra 04 5# ЭА) 
jJ 


j=l 
d 1 

= tr(w!s) — V ——_ A,Qp-Isw-lA; 11.4.13 

(W'S) ress ( )Aj ( ) 


where Aj is the j-th column of A, which follows by making use of the property tr(A B) = 
tr(B A) and observing that A; wo'sw a; is a quadratic form. 


11.4.3. Special case V = 021, 


2 2 


Letting V = o?I where o? is a real scalar, V ^! = a1, = 01, where 0 = o ^, and 


the log-likelihood function can be simplified as 


np np ny 
aa кыы ше щы 

P CS) + у ЖҮ УН 

2 2 714+ 858; / d 
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where 1+ 58; = 1 +0454; with A; being the j-th column of A. Consider the equation 


д 
—InL=0>5 
90 
r A. A; 
np JJ 
ые — a(S 
0 TTE 2 
j=l J = 
T ASA; f A. A; 
20 y cce УБ ui _ ANSA; 10. 11.4.14 
Ы 25 ү. з 2 6A AP p? | i 
j=l J j=l J 
For a specific j, we have 
2 In L = 0 => 
— — In = 
0A; 
n 20A; 0? 2S A; 9 ASA; 


Kab К тєл MEE ETET EE 11.4.15 
21+9ЛуАу 21-0A,A; 2 Foa A; jAy ( ) 


Pre-multiplying (11.4.15) by A’, yields 


n0 A. A; ALSA; | OQ(ALSAD(A,A;) 
- D Еа p um Lp Блу ШЕ (бш) (11.4.16) 
1-0A,Aj; 14+0A'A; (1+0A'.A;)? 


Now, on comparing (11.4.16) with (11.4.14) after multiplying (11.4.14) by 0, we have 
4 Ai SAj 


etus) dg —7 = 6: 11.4.17 
np (S) + Listen A, ( ) 


Multiplying the left-hand side of (11.4.15) by [1 + 0A,A iV /0 gives 
—n(l + OA‘ Aj)A; +01 + 0A,A;)SAj — 6 (A SAj)Aj = 0, (11.4.18) 
Then, by pre-multiplying (11.4.18) by A’, we obtain 
—п(1+ OA AA, A; + é6[ + 0A,Aj)A,SA; — ӨД” SA) A’ Aj =0 > 
0(A,SA;) = n(l + 0A A;)(A5 Aj), (11.4.19) 
which provides the following representation of Ө: 
4 . 
ps nA Aj 
Ai SAj — n(A, Aj)? 


for A,SA; — n(A, Aj)! > 0 (i) 
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11.4.1 
since Ө must be positive. Further, on substituting 0A,SA j ( = е п(1 + 
0A,Aj)(A,A;) in (11.4.18), we have 
—nAj + 0[5 — n(A7A5)1]A;j = 0, (ii) 
which, on replacing 0 by the right-hand side of (7) yields 
-(ASAj)A;j + (Aj Aj) SA; = О 
or, equivalently, 
45А) 1.420 
S — L|4;-9 11.4. 
A, Aj i K | | 
[S - 4jI]A; = О (11.4.21) 


A SA; 

A Aj 
Substituting the value of A5 S Л j from (11.4.19) into (11.4.17) gives the following estimate 
of Ө: 


where Л; = 


. Observe that Л ; in (11.4.20) is an eigenvector of 5 for j = 1,..., p. 


np 


6 = — 
tr(S) п 5 a Aj 


(11.4.22) 


whenever the denominator is positive as 0 is by definition positive. Now, in light of 
(11.4.18) and (11.4.19), we can also obtain the following result for each j: 


| nÂ’ А} 
eae Л г а г. (11.4.23) 
A’. SA; — n(A Âj) 


requiring again that the denominator be positive. In this case, Á j is an eigenvector of S for 
j =1,..., р. Let us conveniently normalize the A j 5 so that the denominator in (1 1.4.22) 
and (11.4.23) remain positive. 


Thus, A j is an eigenvector of S with the corresponding eigenvalue A; for each j, 
j =1,..., p. Out of these, the first r of them, corresponding to the r largest eigenvalues, 
will also be estimates for the factor loadings Aj, j = 1,...,r. Observe that we can 
multiply Á j by any constant cı without affecting equations (11.4.20) or (11.4.21). This 
constant c; may become necessary to keep the denominators in (11.4.22) and (11.4.23) 
positive. Hence we have the following result: 
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Theorem 11.4.2. The sum of all the eigenvalues of S from Eq. (11.4.20), including the 


estimates of the r factor loadings Ді, ..., Ap, is given by 
BALSA; 
Y ——— =1(5). (114.24) 
Zi A Aj 
j= j 


It can be established that the representations of ô given by (11.4.22) and (11.4.23) are 
one and the same. The equation giving rise to (11.4.23) is 


0LA,S A; — n(A Aj] = nA Aj for each /. (iii) 
A’ SA; 

Let us divide both sides of (iii) by AYA j. Observe that rz — Aj is an eigenvalue 
jd 

of S for j = l,...,p, treating A; as an eigenvector of 5. Now, taking the sum over 


j =1,..., р, on both sides of (iii) after dividing by A5Aj, we have 


p p 
e| So, — пуал) = пр = 
j=l j=l 


P 
ofS) -nX A,A; | = np. (iv) 


j=l 


which is Eq. (11.4.22). This proves the claim. 


Hence the procedure is the following: Compute the eigenvalues and the corresponding 
eigenvectors of the sample sum of products matrix S. The estimates for the factor loadings, 
denoted by A j» are available from the eigenvectors A j of S after appropriate normaliza- 
tion to make the denominators in (11.4.22) and (1.4.23) positive. Take the first r largest 
eigenvalues of S and then compute the corresponding eigenvectors to obtain estimates for 
all the factor loadings. This methodology is clearly related to that utilized in Principal 
Component Analysis, the estimates of the variances of the principal components being 
A SAJA A dor § от 


Verification 


Does the representation of 0 given in (11.4.22) and (11.4.23) satisfy the likelihood 
Eq. (11.4.14)? Since 0 is estimated through A; for each j = 1,..., p, we may replace 0 
in (11.4.14) by 6; and insert the summation symbol. Equation (11.4.14) will then be 
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1 A Aj 
= — 7 ~ sS 
"La, nÈ TEG ATA; i 
Al SA; A Aj(A,SA;) 
2s 0 = 0. 11.4.25 
Е D LIA, 16 өдр? ( ) 


Now, substituting the value of Ө; specified in (11.4.23) into (11.4.14), the left-hand side of 
(11.4.14) reduces to the following: 


[Aj SA; —n(A',A;)7] A, Aj 
n Nx ET —n > a TA, TTA, SA; —n(A‘,A;)"] — tr(S) 
jJ j J 


A^SAj 
=, PU — tr(S) = 0, 
j Cu 


owing to Theorem 11.4.2. Hence, Eq. (11.4.14) holds for the value of 0 given in (11.4.23) 
and the value of Л; specified in (11.4.20). 


Since the basic estimating equation for Ó arises from (11.4.23) as 
0LA;SA; -= (A, AjY] =n’ Aj, (v) 


a combined estimate for Ө can be secured. On dividing both sides of (v) by AA j and 
summing up over j, j = 1,..., p, itis seen that the resulting estimate of Ө agrees with 
that given in (11.4.22). 


11.4.4. Maximum value of the exponent 


We have the estimate 0 of Ө provided in (11.4.23) at the estimated value A j of Aj 
for each j, where A j 15 an eigenvector of S resulting from (11.4.20). The exponent of 
the likelihood function is —jtr(X —1) and, in the current context, X = A®A’+ v, 
the identification conditions being that Ф = J, and A’ V -1A be a diagonal matrix with 
positive diagonal elements. Under these conditions and for the special case У = o?I р 
with o^? = 0, we have shown that the exponent in the log-likelihood function reduces to 


ASA A‘ SA 
—46 tr(S) + 202 Уы BAA Now, consider 6 tr(S) — 5 - P Wy = 8. Then, 


Factor Analysis 697 


А SA; 


6 = Otr(S) – 0 0 —————— 
"ud 2, 364,4; 
J= 


P ЛБА; 
ap (05) — o=] 
2, LE 64A; 


p 
= LO = Упал, | from (11.4.19) 
j=l 
= np from (11.4.22). 
Hence the result. 
Example 11.4.1. Tests are conducted to evaluate x; : verbal-linguistic skills, x» : spatial 
visualization ability, and x3 : mathematical abilities. Test scores on x1, x2, and x3 are 
available. It is known that these abilities are governed by two intellectual faculties that 
will be identified as fı and f2, and that linear functions of fı and f2 are contributing 
to x1, x2, хз. These coefficients in the linear functions, known as factor loadings, are 
unknown. Let A = (А;;) be the matrix of factor loadings. Then, we have the model 
xi = M1 fi + àz f2 + ui +e 
X2 = À21 fi + X22 fo + мә + €2 
хз = Agi fi + Азәр + Из + es. 


Let 
X1 Hı €] fi 
X=|x2|, M=|mļ, c=] тё к=], 
2 
X3 Из ез 


where М is some general effect, є is the error vector or the sum total of the contributions 
from unknown factors, F represents the vector of contributing factors and A, the levels 
of the contributions. Let Cov(X) = X, Cov(F) = Ф and Соу(є) = wv. Under the 
assumptions Ф = J, V = о?1 where o? is a real scalar quantity, and / is the identity 
matrix, and A'V-!4A is diagonal, estimate the factor loadings A;;'s and о2іп У.А battery 
of tests are conducted on a random sample of six individuals and the following are the 


data, where our notations are X: the matrix of sample values, X: the sample average, X: 
the matrix of sample averages, and S: the sample sum of products matrix. So, letting 


X=[X1, X,..., X] =l4 2 
2 5 


estimate the factor loadings A;;’s and the variance a. 
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Solution 11.4.1. We begin with the computations of the various quantities required to 
arrive at a solution. Observe that since the matrix of factor loadings A is 3 x 2 anda 
random sample of 6 observation vectors is available, п = 6, р = 3 andr = 2 in our 
notation. 

In this case, 


ЕТ Е es 7 
X-X- 2 dy 1. d =1- 0 |, 
-1 2 0 0 0-1 


6 0-2 
[X – ХІХ – Хү = S= 0 6 -2 
—2 —2 6 


An estimator/estimate of X is X = 5 whereas ап unbiased estimator/estimate of 27 is 
. An eigenvalue of 5 is 1 times the corresponding eigenvalue of 5. Moreover, constant 
multiples of eigenvectors are also eigenvectors for a given eigenvalue. Accordingly, we 
will work with S instead of X or an unbiased estimate of X. Since 


6—A 0 =2 
0 6-A —2 
=2 —2 6—ì 


= 0 = (6 — А)(А2 — 124 + 28) = 0, 


the eigenvalues are A} = 6 + J/8, А = 6, Аз = 6 — V8, the two largest ones being 
Ay = 6+ V8 and Ay = 6. Let us evaluate the eigenvectors U1, U» and U3 associated 
with these three eigenvalues. An eigenvector U; corresponding to A; = 6 + V8 will be a 
solution of the equation 


cp voe es ea Rus 0 ка 
о0о —® —2 | oes e || Oe ee 
Z 

—2 —2 c8] [xs 0 | 1 | 


For Az = 6, the equation to be solved is 


0 0 —2 xı 0 
0 0 —2 x2 0|2 = | –1 
—2 —2 0 X3 0 
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As for the eigenvalue Аз = 6 — V38, it is seen from the derivation of U, that U; = 

1 1 
5: V2’ 1 
2, ИО» = 2, U;Us = 2 and n Rm ШИ; = 6(2+2 +2) = 36. However, tr(S) = (6 + 
V8) + (6) 4- (6— 4/8) = 18. So, let us multiply each vector by A so that nO a UU; = 
Je + 5 + 2) = 12 апа tr($) — n 3 00; = 18 — 12 > 0. Thus, the estimate of Ө is 
given by 


]. Let us now examine the denominator of Ê in (11.4.22). Observe that U 1 Ui 


А 6)(3 
ja u oss r 
(5) –п У ,U'U;j 18—12 


1 
3 
In light of (11.4.20), the factor loadings are estimated Бу U; and U2 scaled by T 
Hence, the estimates of the factor loadings, denoted with a hat, are the following: 


Au = CC = ор Ал = (GO) = 7e a = CO) = 4 Ар = 
1 _ 1 nelar ol fy 
(= te = ODS Ag 0 


11.5. General Case 


Let 
0p; Aij 
yi Е ‚ 8.6; = Д.24, Aj = dr 
Op pj pj 
02 0 ... 0 
ш = 92 = н e 7 ° and |1 + A'€2A| = [Ta 5555. 
00... Ө? = 


We will take 6;, j =1,...,r, and © = diag(01, 02, ..., 0) as the parameters. Expressed 
in terms of the ó;'s and ©, the log-likelihood function is the following: 


p r 
np n 
InL = —7 Ол) +n} Ino; — 5 ) mC + 953) 
j=l ја 
0%) ту? ! (8,050 8) (11.5.1) 
== р = —— = р i). m 
2 241158; 7 d 
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Let us take 9;, j = 1,...,r, and © as the parameters. Differentiating In L partially with 
respect to the vector 6;, for a specific j, and equating the result to a null vector, we have 
the following (referring to i 1 for vector/matrix derivatives): 


n 28; 25; 120S03; 


“27405; 2 10870593, тву ITEA ©” (i) 
which multiplied by 1 + D j > 9, yields 
ав S686). | (OS8)5; — O. (11.5.2) 
1-3; 7 i 


On premultiplying (11.5.2) by 5 and then by 1 + 87 6;, and simplifying, we then have 
—п(1-+ 8,8у)(8,8;) + (8,0508) = О > 


050%; 
————— = n. (11.55) 
1+ 6.3; / 
Ј 
Let us differentiate In L as given in (11.5.1) partially with respect to 0; for a specific j 
such as j — 1. Then, 


n 1 1 
-20; — À— — (8,0808; 
0; 2 PTD Dor 20; zë im 


where 
/ / д / д 
x; o 0505; = $;| —e ses; + 5,Ө5| —e |s; 
ДЕТ 90; 
10... 0 o 0 0 
à 00..0| , 3,5 [0 0... 0 
— 0 = = 0—0 = , 
д0) x e 961 4 
0 0 0 0 0 0, 
so that 
[0 ЖС. le Ө (ii) 
22 Р —— = У п 
E ? 36, 
Hence, 
p / / 
9 0,9503; 0,9508; 
[Zog] LEES ЖУЗ. Iq 
j=l 1 j=l id j=l TW 
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and then, 


[age |e =0= 


j=l 

а r OSOS; 
2 j J 

Ј=1 j=l J 


However, given (11.5.3), we have n5 àj = 0508/01 + 8:03), j= 1,...,r, апа 
therefore (11.5.4) can be expressed as 


р r 
np — У 675); + У п; =0. (iii) 
j=l j=l 
Letting 
1 r 
gel dx (11.5.5) 
р 
j 
equation(iii) can be written as 
p 
na +c) – 67sjj]=0, c=0, jar... р, 
j=l 
so that a solution for 6; is 
jp. EO „з OM. (11.5.6) 
Ј Sjj J п(1 +c)’ 
with the proviso that c = 0 for 0? and GF, j=r+1,..., p. Then, an estimate of O SO 
is given by 
1 1 
cmm к e. 0 Jm : ue di 
M 0 e. 0 0 e. 0 
080 = п(1 + o) у $ бв 
0 0... N77 0 Ü ux US 


= п(1+ c)R (11.5.7) 
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where R is the sample correlation matrix, and on applying the identities (11.5.3) and 
(11.5.7), (11.5.2) becomes 


—nd; — п(8,8)8; + (n(1 + c) 8; =0 > 
1 - $8; 
[A а 14s; = О. (11.5.8) 
1+с 


This shows that à; is an eigenvector of R. If v; is an eigenvalue of R, then the largest r 
eigenvalues are of the form 


i aa 11.5.9 
== ; Senai ©. 

7 l+c l ( 
and the remaining ones аге ууф, ..., vp, Where с is as specified in (11.5.5). Thus, the pro- 
cedure is the following: Compute the eigenvalues >}, j = 1,..., p, of К and determine 
the corresponding eigenvectors, denoted by $; ду, j =1,..., p. The first r of them which 
correspond to ће r largest v;’s, are 5; = OA; > A; = _ ô- 18, j=1,...,r. Let 

A x n(1+c) 
д1; А1; OC E — 0 
А ф | A Жо; А о чо 0 
2| 7|. A; P | and 62 =] . moo |. (015.10) 
ү. d : : : x 
ilo Li] 0 о а 
Then, 
x UE ux 
EUM j, i=1,...,p, j=1,...,f, (11.5.11) 


му = 1]? 
Jn(i +c) d 


and Ó is available from (11.5.10). All the model parameters have now been estimated. 
11.5.1. The exponent in the likelihood function 


Given the MLE’s of the parameters which are available from (11.5.6), (11.5.8), 
(11.5.10) and (11.5.11), what will be the maximum value of the likelihood function? Let 
us examine its exponent: 


l 45 Hx xx 1 п 
= EO S) 22,00 = – 501 + ШВ) + 7 pe 


Ор 4 
= ——n —n == 
2 CAP т пре 2 
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that is, the same value of the exponent that is obtained under a general 27. The estimates 
were derived under the assumptions that X = V + AA’, A'V-14A is diagonal with the 
diagonal elements 8707, j=l,...,r, and = ө. 


Example 11.5.1. Using the data set provided in Example 11.4.1, estimate the factor load- 
ings and the diagonal elements of Соу(є) = V = diag(wWi1,..., Wpp). In this example, 
кз, в = 6, pel 


Solution 11.5.1. We will adopt the same notations and make use of some of ће com- 
putational results already obtained in the previous solution. First, we need to compute the 
eigenvalues of the sample correlation matrix R. The sample sum of products matrix S is 
given by 


6 0 -2 1 0 = 1 
S= 0 6—2 |> R= 0 I = =e 
= ш— 2 2 
2 —2 6 -$ -$ 1 
Hence, the eigenvalues of R are E times the eigenvalues of S, that is, vj = «(6 + /8) = 
1+ У ру = 106) = 1 and v = 1 — MA Since 7 will be canceled when determining the 


VN the eigenvectors of S will coincide with those of R. They are the following, 
denoted again by 8j, j = 1,2,3: 


acido 8181 = 2, 5,52 = 2, 8,83 = 2 and c as defined in (11.5.5) is c = iQ +2)= 

4 = n(l+c)= 6(1 + 3) = = 14. hen, in light of (11.5.10), the E of Wij, = = 
1, 2, 3, are available as Wj; = дт = = з ог = Ê? = $ = 3 = фо = зз, 
denoting the estimates with a Bat Therefore, the diagonal matrix ф = "diag, 3 5, 3). 


Hence, the matrix 97! = wr is estimated by 9-! = бав (2, 3$ Уз), From (11.5.11), 


Aj j= = @-!8; = j= = diag, УЗ, УЗ) 8j, that is, ô; is pre-multiplied by МА Therefore, the 


estimates of the factor loadings are: Au = == (A) = == do = = ўз = = =; Азу = = 


УЗ = зз = Ао = —Â22, and Аз = 0. 
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11.6. Tests of Hypotheses 


The usual test in connection with the current topic consists of assessing identifiability, 
that is, testing the hypothesis Ho that the population covariance matrix X > О can be 
represented as X = АФ A’ + V when Ф = I, A'V-!A is a diagonal matrix with positive 
diagonal elements, У > О is a diagonal matrix and A = (Ajj) is a p xr, r < p, matrix 
of full rank r, whose elements are the factor loadings. That is, 


Ho: E = AA +Y. (11.6.1) 


In this instance, a crucial aspect of the hypothesis H, consisting of determining whether 
"the model fits", is that the number r be designated since the other quantities n, the sample 
size, and p, the order of the observation vector, are preassigned. Thus, the phrase “model 
fits" means that for a given r, 27 can be expressed in the form Y = V + AA’, in addition 
to satisfying the identification conditions. The assumed model has the representation: X — 
М + AF + є where X’ = (x1, ..., xp) stands for the p x 1 vector of observed scores on 
p tests or p batteries of tests, M is a p x 1 vector of general effect, F is an r x 1 vector of 
unknown factors, A = (уу) is the unknown p x r matrix of factor loadings and є is the 
p х 1 error vector. When e and F are uncorrelated, the covariance matrix of X is given by 


У = ЛФЛ + У 


where Ф = Cov(F) > О and У = Соу(є) > O with Ф being r x r and V being p x p 
and diagonal. A simple random sample from X will be taken to mean a sample of inde- 
pendently and identically distributed (iid) p x 1 vectors X^ = (xj jo Xj». Xp) J = 
1,...,п, with n denoting the sample size. The sample sum of products matrix or “cor- 
rected” sample sum of squares and cross products matrix is 5 = (sij), Sij = Nac ik — 
Xi)(xjk — xj), where, for example, the average of the x;'s comprising the i-th row of 
X = [X,,..., Xn], namely, xj, is x; = Уы xix / n. If є and F are independently nor- 
mally distributed, then the likelihood ratio criterion or A-criterion is 


max ду L PE 2 PI 
- — — т == w= Ал E >” (11.6.2) 
тах L IAA + W|2 |AA’+ v| 
where X = 15 and the covariance matrix X = AA’ + V under Но, with Ф = Cov(F) 
assumed to be an identity matrix and ће r x r matrix A'w-!1A = diag(8,81, ..., 0,0) 
having positive diagonal elements D p J= 1,..., ғ. Referring to Sect. 11.4.2, we have 


IAA' + v| = | |A'w-! A + I| (11.6.3) 
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and 1 + 0,8; =1+ A,U 7! Aj = 1 + 4,0? Aj where 8; = V-2A, = OA; and A; is 
the j-th column of A for j = 1,...,r. It was shown in (11.5.8) that д; is an eigenvector 
of the sample correlation matrix R and 


К 
| [a +556) 2 11714 4 1l. 
j=l 
However, in view of the discussion following (11.5.8), an eigenvalue of R is of the form 
146'.5; 
yj = ae, j =1,...,r. Let vi, ... vy be the eigenvalues of А and let the largest r 
of them be р, ..., у. It also follows from (11.5.8) that ©? = п(1 + с)іав (= Е E) 


ZI 
. 1 
with О = v 7, Thus, 


РӘТ і Е 
j=l 
= | Па +Ê кыс» 
j=1 
128] Па + 876) vp 
Ё||А'ф-1А+[ E-101 +8,8) 
ВЕ ШЕ ДЕ. (11.6.4) 


Hence, we reject the null hypothesis for small values of the product v,+1 · · - vp, that is, 
the product of the smallest p — r eigenvalues of the sample correlation matrix А. In order 
to evaluate critical points, one would require the null distribution of the product of the 
eigenvalues, уфт -- - v, which is difficult to determine for a general p. How can rejecting 
the null hypothesis that the *model fits" be interpreted? Since, in the whole structure, the 
decisive quantity is r, we are actually rejecting the hypothesis that a given r is the number 
of main factors contributing to the observations. Hence, we may seek a larger or smaller r, 
keeping the structure unchanged and testing the same hypothesis again until the hypothesis 
is not rejected. We may then assume that the r specified at the last stage is the number of 
main factors contributing to the observation or we may assert that, with that particular r, 
there is evidence that the model fits. 


We will now determine conditions ensuring that the likelihood ratio criterion A be less 
than or equal to one. While, assuming that A’W~! A is diagonal, the left-hand side of the 
deciding equation, X = AA’+W, has p(p+1)/2 parameters, there are p r-- p—r(r—1)/2 


706 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


conditions on the right-hand side where r(r — 1)/2 arises from the diagonality condition. 
The difference is then 


ES Np A 


€-—— 8 _ 
2 7 БЕ ry —(p+7)l =: (11.6.5) 


This o depends upon the parameters p and r, whereas A depends upon p, r and c. Thus, A 
may not be < 1. In order to make A < 1, we can make c close to 0 by multiplying the ô js 
by a constant, observing that this is always possible because the 8 j ѕ are the eigenvectors 
of R. By selecting a constant m and taking the new ô jas NL у, € can be made close to 
О and A will be < 1, so that rejecting the null hypothesis for small values of А will make 
sense. It may so happen that there will not be any parameter left to be restricted by the 
hypothesis H, that “model fits". The quantity o appearing in (11.6.5) could then be < 0, 
and in such an instance, the hypothesis would not make sense and could not be tested. 

The density of the sample correlation matrix R is provided in Example 1.25 of Mathai 
(1997, р. 58). Denoting this density by f£ (К), it is the following for the population covari- 
ance matrix X in a parent №, (ш, X) population with X being a positive definite diagonal 
matrix, as was assumed in Sect. 11.6: 


IP? m_ ptt 
f(R) = —— —IRI? 7, рь О, т=п- 1, п> р, (11.6.6) 
DG) 


and zero elsewhere, where n is the sample size. 


11.6.1. Asymptotic distribution of the likelihood ratio statistic 


For a large sample size n, —2 ln А is approximately distributed as а chisquare random 
variable having k degrees of freedom where А. is the likelihood ratio criterion and k is the 
number of parameters restricted by the hypothesis Ho. This approximation holds whenever 
the sample size n is large and k > 1. With o as defined in (11.6.5), we have 


1 
k=p= 510 - 7 - (Qni. (11.6.7) 


However, p—r = 1 and p+r = 5 in the illustrative example, so that k = —2. Accordingly, 
even if the sample size n were large, this asymptotic result would not be applicable. 


11.6.2. How to decide on the number r of main factors? 


The structure of the population covariance matrix X under the model X; = M+ AF+ 
Ej, J= 1,...,n, is 


У = ЛФЛ +Y > У = ЛЛ +Y for Ф = I, (11.6.8) 
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where it is assumed that E; and F are uncorrelated, X = Cov(X;) > O is p x p, 
Ф = Cov(F) = I, ther x r identity matrix, V = Cov(E;) is a p x p diagonal matrix 
and A = (Aj;;) is a full rank p x r, r < p, matrix whose elements are the factor loadings. 
Under the orthogonal factor model, Ф = I. Moreover, to ensure the identification of the 
model, we assume that A’W—! A is a diagonal matrix. Before initiating any data analysis, 
we have to assign a value to r on the basis of the data set at hand in order to set up the 
model. Thus, the matter of initially setting the number of main factors has to be addressed. 
Given 


к= ЛЛ + У (11.6.9) 


where A = (у) is a p x r matrix and У is a p х p diagonal matrix, does a solution that 
is expressible in terms of the elements of А (or those of S if S is used), exist for all A;;’s 
and y/;;'s? In general 

R=) UU, t c + ApUpU, (11.6.10) 


where the A ;’s are the eigenvalues of А and the U;'s are the corresponding normalized 
eigenvectors. Observe that U;U; is p x p whereas ШИ, = 1, J Shea, wir = 
р, then a solution always exists for (11.6.9). When taking У = О, we can always let 
R = BB' for some p x p matrix B, which can be achieved for example via a triangular 
decomposition. Accordingly, the relevant aspects are r < p and the diagonal elements in 
V , namely, the y/;;'s being positive. Can we then solve for all the A;;'s and w;;’s involved 
in (11.6.9) in terms of the elements in R? The answer is that a solution exists, but only 
when certain conditions are satisfied. Our objective is to select a value of r that is as small 
as possible and then, to obtain a solution to (11.6.9) in terms of the elements in R. 


The analysis is to be carried out by utilizing either the sample sum of products matrix S 
or the sample correlation matrix R. The following are some of the guidelines for selecting 
r in order to set up the model. 


(i): Compute all the eigenvalues of R (ог S). Let r be the number of eigenvalues > 1 if 
the sample correlation matrix R is used. If S is used, then determine all the eigenvalues, 
calculate the average of these eigenvalues, and count the number of eigenvalues that are 
greater than or equal to this average. Take that number to be r. 

(2): Carry out a Principal Component Analysis on R (or S). If S is used, ensure that 
the units of measurements are not creating discrepancies. Compute the variances of these 
Principal Components, which are the eigenvalues of R (or 5). Let Aj, j = 1,...,p, 
denote these eigenvalues. Compute the ratios 


Ad ее, 


,m=1,2,..., 
Apre Ep 
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and stop with that m for which the desired fraction of the total variation in the data is 
accounted for. Take that m as r. When implementing the principal component approach, 
the factor loadings 2;;'s and the уу; 'ѕ can be estimated as follows: From (11.6.10), write 


r p 
R — A- B with A=) A;U;U; and B= у, А00, 
j=l ј=ғ+1 


where А can be expressed as V V’ with V = [,/AjU1,..., /X;U,]. Then, V is taken as 
an approximate estimate of A or as A. Observe that A у> 0, j=1,..., p. The sum over 
j of the i-th diagonal elements of A; U;U T jJ=rctl,..., р, will provide an estimate for 
Vii, i= 1,..., p. These estimates can also be obtained as follows: Consider the estimate 
of oj; denoted by бү; which is equal to the sum of all the i-th diagonal elements in A + B; 

it will be 1 if R is used and 6j; if S is utilized in the analysis; then, i/i; = би — 258 Àj 


and Wi і 15 now the sum of the i-th diagonal elements in В. 


(iii): Consider the individual correlations in the sample correlation matrix R. Identify the 
largest ones in absolute value. If the largest ones occur at the (1,3)-th and (2,3)-th positions, 
then the factor f3 will be deemed influential. Start with r = 1 (factor f3) and carry out the 
analysis. Then, assess the proportion of the total variation accounted for by 633. Should the 
proportion not be satisfactory, we may continue with r — 2. If the (2,3)-th position value 
is larger in absolute value than the value at the (1,3)-th position, then f? may be the next 
significant factor. Compute 633 + 622 and determine the proportion to the total variation. 
If the resulting model is rejected, then take r — 3, and continue in this fashion until an 
acceptable proportion of the total variation is accounted for. 


(iv): The maximum likelihood method. With this approach, we begin with a preselected r 
and test the hypothesis that, when comprising r factors, the model fits. If the hypothesis is 
rejected, then we let number of influential factors be r — 1 or r + 1 and continue the process 
of testing and deciding until the hypothesis is not rejected. That final r 1s to be taken as the 
number of main factors contributing towards the observations. The initial value of r may 
be determined by employing one of the methods described in (i) or (ii) or (iii). 


Exercises 


11.1. For the following data, where the 6 columns of the matrix represent the 6 observa- 
tion vectors, verify whether r — 2 provides a good fit to the data. The proposed model is 
the Factor Analysis model X = M+ AF + є, Е = (fi, f2), Ais3 x 2 and of rank 2, 
Cov(e) = V is diagonal, Cov(F) = Ф = I, and Cov(X) = У > О. The data set is 
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Оо 1—1 0 1—1 
1 1 0 2 2 0 
—1 =. 0 =. -I —2 


11.2. For the model X = М + AF + e with the conditions as specified in Exercise 
11.1, verify whether the model with r = 2 or r = 3 gives a good fit on the basis of the 
following data, where the columns in the matrix represent five observation vectors: 
10-1 1 0 
—1 1 1 0-1 
10 12 1 
11 21 0 


11.3. Doa Principal Component Analysis in Exercise 11.1 to assess what percentage of 
the total variation in the data is accounted for by r — 2. 


11.4. DoaPrincipal Component Analysis in Exercise 11.2 to determine what percentages 
of the total variation in the data are accounted for by r — 2 and r — 3. 


11.5. Even though the sample sizes are not large, perform tests based on the asymptotic 
chisquare to assess whether the two tests there agree with the findings in Exercises 11.1 
and 11.2. 


11.6. Four model identification conditions are stated at the end of Sect 11.3.1. Develop 
A-criteria under the conditions stated in (i): case (2); (ii): case (3), selecting your own В|; 
(iii): case (4). 
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Chapter 12 f) 
Classification Problems hen for 


12.1. Introduction 


We will use the same notations as in the previous chapters. Lower-case letters x, y, ... 
will denote real scalar variables, whether mathematical or random. Capital letters X, Y,... 
will be used to denote real matrix-variate mathematical or random variables, whether 
square or rectangular matrices are involved. A tilde will be placed on top of letters such as 
X, ys X , Ү to denote variables in the complex domain. Constant matrices will for instance 
be denoted by A, B, C. A tilde will not be used on constant matrices unless the point is to 
be stressed that the matrix is in the complex domain. The determinant of a square matrix A 
will be denoted by |A| or det(A) and, in the complex case, the absolute value or modulus of 
the determinant of A will be denoted as |det(A)|. When matrices are square, their order will 
be taken as p x p, unless specified otherwise. When A is a full rank matrix in the complex 
domain, then AA* is Hermitian positive definite where an asterisk designates the complex 
conjugate transpose of a matrix. Additionally, dX will indicate the wedge product of all 
the distinct differentials of the elements of the matrix X. Thus, letting the p x q matrix 
X = (xij) where the x;;’s are distinct real scalar variables, dX = Ac ^5 a dxj;. For the 
complex matrix X = X;+iX2, i = J/(—1), where X, and X3 are real, dX = dX; A dX». 


Historically, classification problems arose in anthropological studies. By taking a set of 
measurements on skeletal remains, anthropologists wanted to classify them as belonging 
to a certain racial group such as being of African or European origin. The measurements 
might have been of the following type: x; — width of the skull, x? — volume of the skull, 
x3 — length of the thigh bone, x4 — width of the pelvis, and so on. Let the measurements 
be represented by a p x 1 vector X, with X’ = (х1,..., xp) where a prime denotes 
the transpose. Nowadays, classification procedures are employed in all types of problems 
occurring in various contexts. For example, consider the situation of a battery of tests in 
an entrance examination to admit students into a professional program such as medical 
sciences, law studies, engineering science or management studies. Based on the p x 1 
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vector of test scores, a statistician would like to classify an applicant as to whether or not 
he/she belongs to the group of applicants who will successfully complete a given program. 
This is a 2-group situation. If a third category is added such as those who are expected to 
complete the program with flying colors, this will become a 3-group situation. In general, 
one will have a k-group situation when an individual is classified into one of k classes. 


Let us begin with the 2-group situation. The problem consists of classifying the p x 1 
vector X into one of two, groups, classes or categories. Let the categories be denoted by 
population лі and population тл». This means X will either belong to л] or to 72, no other 
options being considered. The p x 1 vector X may be taken as a point in a p-space Rp 
or p-dimensional Euclidean space i”. In а two-group situation when it is decided that the 
candidate either belongs to the population л or the population л», two subspaces A, and 
Аз within the p-space А, are determined: A; C Rp and A2 С Rp, with A1 N A2 = О (the 
empty set) or a decision rule can be symbolically written as A = (A1, A»). If X falls in 
Aj, the candidate is classified into л and if X falls in Az, then the candidate is classified 
into ло. In other words, X є A, means the individual is classified into population zr, and 
X є A» means that the individual is classified into population л». The regions A, and 
A» or the rule A = (А, A») аге not known beforehand. These are to be determined by 
employing certain decision rules. Criteria for determining A; and A» will be subsequently 
put forward. Let us now consider the consequences. When a decision is made to classify 
X as coming from лі, either the decision is correct or the decision is erroneous. If the 
population is actually лу and the decision rule classifies X into лі, then the decision is 
correct. If X is classified into zr? when in reality the population is л, then a mistake has 
been committed or a misclassification occurred. Misclassification will involve penalties, 
costs or losses. Let such a penalty, cost or loss of classifying an individual into group i 
when he/she actually belongs to group j, be denoted by C (i|). In a 2-group situation, 
i and j can only equal 1 or 2. That is, C(1|2) > 0 and C(2|1) > 0 are the costs of 
misclassifying, whereas C(1|1) = 0 and C(2|2) = 0 since there is no cost or penalty 
associated with correct decisions. The following table summarizes this discussion: 


Table 12.1: Cost of misclassification C (i| j) 


Statistician's decision to classify into 

7 T2 
Population J| 0 C(2|1) 
In reality T2 C (112) 0 
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12.2. Probabilities of Classification 


The vector random variable corresponding to the observation vector X may have its 
own probability/density function. The real scalar variables as well as the observations on 
them will be denoted by the lower-case letters x1, ..., xp. When dealing with the proba- 
bility/density function of X, X is taken as vector random variable, whereas when looked 
upon as a point in the p-space, Rp, X is deemed to be an observation vector. The p x 1 
vector X may have a probability/density function P (X). In a 2-group or two classes situa- 
tion, P(X) is either Pı (X), the population density of лі or P2 (X), the population density 
of ло. For convenience, it will be assumed that X of the continuous type, the derivations 
in the discrete case being analogous. In the 2-group situation, P(X) can only be P1(X) 
or P(X). What is then the probability of achieving a correct classification under the rule 
A = (A1, А2)? If the sample point X falls in А |, we classify the candidate as belonging to 
лі, and if the true population is also лі, then a correct decision is made. In that instance, 
the corresponding probability is 


Prun a= f Р\(Х)ах (12.2.1) 
A1 


where dX = dx; лахо ^... A^dxp, A = (A1, Аз) denoting one decision rule or one given 


set of subspaces of the p-space R,. The probability of misclassification in this case is 


Pr(2|1, A} 2 Pi(X)dX. (12.2.2) 
A2 


Similarly, the probabilities of correctly selecting and misclassifying P2(X) are respectively 
given by 


Pr(2|2, A} = / Р,(Х)ах (12.2.3) 
A2 
and 
Pr(1|2, A] zi Р,(Х)ах. (12.2.4) 
A1 


In a Bayesian setting, there is a prior probability qı of selecting the population zr; and 42 of 
selecting the population тә, with qı +92 = 1. Then, what will be the probability of drawing 
an observation from л; and misclassifying it as belonging to 75? It is qı x Pr{2|1, A} = 
af a Pi(X )dX and, similarly, the probability of drawing an observation from л» and 
misclassifying it as coming from zt; is q2 x Pr{1|2, A} = qo f Ai P(X), with the respective 
costs of misclassifications being C(2|1) = C(2|1, A) and C(1|2) = C(1|2, A). What is 
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then the expected cost of misclassification? It is the sum of the costs multiplied by the 
corresponding probabilities. Thus, 


the expected cost = qı C(2|1) Pr{2|1, A} + q2 C(1|2) Pr{1|2, A}. (12.2.5) 


So, an advantageous criterion to rely on, when setting up A; and Аз would consist in min- 
imizing the expected cost as given in (12.2.5). A rule could be devised for determining A1 
and A» accordingly. In this regard, this actually corresponds to Bayes' rule. How can one 
interpret this expected cost? For example, in the case of admitting students to a particular 
program of study based on a vector X of test scores, it is the cost of admitting potentially 
incompetent students or students who would not have successfully completed the program 
of study and training them, plus the projected cost of losing good students who would have 
successfully completed the program of study. 


If prior probabilities qı and q2 are not involved, then the expected cost of misclassify- 
ing an observation from zr, as coming from 7 is 


C(2|1) Pr{2|1, A} = E1(A), (12.2.6) 
and the expected cost of misclassifying an observation from л» as coming from л is 
C(1|2) Pr(1|2, A) = E»(A). (12.2.7) 


We would like to have Е (А) and Е (А) as small as possible. In this case, a procedure, rule 
or criterion A = (A1, Аз) corresponds to determining suitable subspaces A, and A» in the 
p-space Rp. If there is another procedure AW) = (А. AS?) such that E1(A) < Ej( AO?) 
and F(A) < Ex(AY )), then procedure A is said to be as good as AU), and if at least one 
of the inequalities above is a strict inequality, that is <, then A is preferable to А. If 
procedure A is preferable to all other available procedures AC), j =1,2,..., A is said to 
be admissible. We are seeking an admissible class {A} of procedures. 


12.3. Two Populations with Known Distributions 


Let лу and 7r? be the two populations. Let Р(Х) and Р(Х) be the known p-variate 
probability/density functions associated with л and лә, respectively. That is, Р(Х) and 
P(X) are two p-variate probability/density functions which are fully known in the sense 
that all their parameters are known in addition to their functional forms. Consider the 
Bayesian situation where it is assumed that the prior probabilities д and q2 of selecting 
л and лә, respectively, are known. Suppose that a particular p-vector X is at hand. What 
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is the probability that this given X is an observation from лт? This probability is д1 Р(Х) 
if X is discrete or qı Pı(X)dX if X is continuous. What is the probability that the given 
vector X is an observation vector either from тү or from x2? This probability is qı P1 (X) + 
q2 P) (X) or [q1 P1 (X) +92 P2 (X) ]dX. What is then the probability that the vector X at hand 
is from Pj (X), given that it is an observation vector from zr, or 72? As this is a conditional 
statement, it is given by the following in the discrete or continuous case: 


qi Pi (X) " qı P1(X)d X 
qi P\(X) + qo PX(X) [qi P1 CX) + q2 P( X)]d X 
qı P(X) 


E (12.3.1) 
qı P\(X) + qo PX(X) 


where dX, which is the wedge product of differentials and positive in this case, cancels 
out. If the conditional probability that a given X is an observation from лі is larger than or 
equal to the conditional probability that the given vector X is an observation from тә and 
if we assign X to лі, then the chance of misclassification is reduced. Our main objective 
is to minimize the probability of misclassification and then come up with a decision rule. 
This statement is equivalent to the following: If 


qı P\(X) E q2? P2(X) 
qı Pı(X) + qq P(X)  q1 Pi(X) + qo PX(X) 


=> qi P(X) = qo PX(X) (12.3.2) 


then we assign X to л, meaning that our subspace A, is specified by the following rule: 


PX). Ф 
Р(Х) di 
PX) 42» 
Р(Х) q 


Al: qı Pı(X) > qo P2(X) => 


A»: qı Pı(X) < qo Р(Х) = (12.3.3) 


Note that if qı Pı(X) = q2P2X), then X can be assigned to either лү or лә; however, 
we have assigned it to л] for convenience. Observe that, it is assumed that q1 Pı(X) + 
q2Pa(X) Æ 0, qı > 0, q2 > O and qı + q2 = 1 in (12.3.2). The conditional statement 
made in (12.3.2), which can also be written as 


qi Pi(X) Е ni Pi (X) 
qui P1(X) + q2P2(X) m P1CX) + n2 P2(X) 


‚т> 0, тт => 0, =q, i= 1,2, 


holds for some weight functions n;, i = 1, 2. 
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If the observation is from лу : Р(Х), then the expected cost of misclassification is 
qı P\(X)C(2|1) + g2Po(X)C(2|2) = qi Pi (X)C(2|1) since C(iļi) = 0, i = 1,2. Sim- 
Папу, the expected cost of misclassifying of the observation X from m2 : Р›(Х) is 
42 P2(X)C(1|2). If Pj (X) is our preferred distribution, then we would like the associated 
expected cost of misclassification to be the lesser one, that is, 


qiP1(X)CQ]|1) < q2P2(X)C(1|2) in A2 > 
Р(Х) ФС(12). 
< in A» or 
P(X) а\СО!|) 
EGO OE) а. (12.3.4) 
Р(Х) ~ qiC(2|1) 


which is the same rule as in (12.3.3) where g is replaced by q1 C (2|1) and дә, by q2C (1/2). 
12.3.1. Best procedure 


It can be established that the procedure A = (Aj, A2) in (12.3.3) is the best one for 
minimizing the probability of misclassification. To this end, consider any other procedure 
AW = (AU ) А ^, j = 1,2,.... The probability of misclassification under the proce- 
dure A) is the following: 


a | | р бах +q | . P(X)dX 
АЎ А? 


= m АРХ) — o POOMIX + а f Py (X) dX. (12.3.5) 
АЎ? AT UA 


If А" WA” = Rp, then RONG P(X )dX = 1; itis otherwise a given positive constant. 
1 2 

However, qı P1 (X) — д2 Р(Х) can be negative, zero or positive, whereas the left-hand side 

of (12.3.5) is a positive probability. Accordingly, the left-hand side is minimum if 


P(X) Q 


Р(Х) — qo Po(X 0 , 
qi P\(X) — 42Р›(Х) < PG a 


© 


which actually is the rejection region A» of the procedure А = (A1, Аз). Hence, the 
procedure A = (A1, A2) minimizes the probabilities of misclassification; in other words, 
it is the best procedure. If cost functions are also involved, then (i) becomes the following: 


PRAJ „Сш 
Р(Х) CQlDqi 


(i) 
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The region where q1 P1(X) — q2 P2(X) = 0 or gi C (2|1) Pi (X) — q2C(1|2) P2(X) = 0 need 
not be empty and the probability over this set need not be zero. If 


[29 Ф Cp) 
PO) 4 СО) 


л =0, i= 1,2, (12.3.6) 


it can also be shown that the above Bayes procedure A = (A1, A2) is unique. This is stated 
as a theorem: 


Theorem 12.3.1. Let qı be the prior probability of drawing an observation X from the 
population лі with probability/density function P1(X) and let q be the prior probabil- 
ity of selecting an observation X from the population ло with probability/density function 
P(X). Let the cost or loss associated with misclassifying an observation from лі as com- 
ing from тә be C(2|1) and the cost of misclassifying an observation from тә as originating 
from лу be C(1|2). Letting 


P Ee _ Cala 
Р(Х) CQlDaqi 


z} =0, і = 1,2, 


the classification rule given by А = (A1, A2) of (12.3.4) is unique and best in the sense 
that it minimizes the probabilities of misclassification. 


Example 12.3.1. Let; and л» be two univariate exponential populations whose param- 
eters are 01 and 05 with 0; = 02. Let the prior probability of drawing an observation from 
xı be qı = 1 and that of selecting an observation from лә be 42 = 1. Let the costs ог loss 
associated with misclassifications be C(2|1) = C(1|2). Compute the regions and prob- 
abilities of misclassification if (1): a single observation x is drawn; (2): iid observations 
X1,...,Xp, are drawn. 


Solution 12.3.1.(1). In this case, one observation is drawn and the populations are 


6 uam UU 7219 


| = 


Pi(x) = 8; 


Consider the following inequality on the support of the density: 


P(x) " C (1/2) 4 _ 
Pa(x) ~ C(2|1) q1 


, 


or equivalently, А АЕ T" j 
Sg aes e s Л. 
01 ~ 65 
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On taking logarithms, we have 


1 1 0 1 1 61 
-(- — =) > In — = x(— — =) > ln — 
61 Ө» Ө» 05 01 0» 


010 0 
=» х > p es for 0 > Ө). 
0-0 60 
Letting Өү > 6», the steps in the case 6; < 0» being parallel, we have 
010 0 
Lek dL ne 
01-06 0, 


Accordingly, 
Aj:x > kand Ag: x < k. 


The probabilities of misclassification are: 
k 


D x. к 
Pain = f пое = | —e Idx = 1- e’ 
A2 x=0 01 
S i ae E 
ram = f —e @dx=¢% 
х= Ө? 


Solution 12.3.1(2). In this case, X’ = (x1,..., Xn) and 


CIE MEL: ] и“ 
Р; Х) = = % = — 9 , = 1, 2, 
(Х) П oT ane i 
j=l : 
where u = Xizi xj is gamma distributed with the parameters (n, 0;), i = 1,2. The 
density of u is then given by 
1 i 


(w) = —— u" le, i=1,2. 
gi (и) Tn) e 1 , 


Proceeding as above, for 0; > 02, Ау: и > kı and Ag:u < kj, kj = ZI ш[1]" = nk 
where k is as given in Solution 12.3.1(1). Consequently, the probabilities of misclassifica- 


tion are as follows: 
ki п—1 Р 
pain = f : ea [^ й Le" dy 
и=0 011 (n) o T(n) 


со и"! m со un! 
Pa = f P е au = | e “du 
k 6 Г(п) fl Г (п) 


1 
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where the integrals can be expressed in terms of incomplete gamma functions or deter- 
mined by using integration by parts. 


Example 12.3.2. Assume that no prior probabilities or costs are involved. Suppose that 
in a certain clinic, the waiting time before a customer is attended to, depends upon the 
manager on duty. If manager Mj is on duty, the expected waiting time is 10 minutes, апа 
if manager Мэ is on duty, the expected waiting time is 5 minutes. Assume that the waiting 
times are exponentially distributed with expected waiting time equal to 6;, і = 1, 2. On 
a particular day (1): a customer had to wait 6 minutes before she was attended to, (2): 
three customers had to wait 6, 6 and 8 minutes, respectively. Who between M, and М» 
was likely to be on duty on that day? 


Solution 12.3.2.(1). In this case, 0; = 10, 62 = 5 and the populations are exponential 


with parameters 0; and 05, respectively. Thus, k = gs. In 5 i = OO) In 2 = 10102, 
k k 
Sp ШО оошо шетине = са boemi r= 


0.25. In (1): the observed value of x = 6 < 10(In2) = 10(0.69314718056) « 6. 9315. 
Accordingly, we classify x to Мэ, that is, the manager M» was likely to be on duty. Thus, 


P(2\|2, A) = The probability of making a correct decision 


k k 1 - 
= / P»(x)dx EY P»(x)dx af -e 5dx 
x<k 0 о 5 


1 
—1—e ™4=1-_-=075; 
A 


P(2\1, A) = Probability of misclassification or making an incorrect decision 
1 


k k 
E пое = | de ТИИ ЕРЕ лн 
0 о 10 2 ~ 


Solution 12.3.2.(2). Here, и = 6+6+8 = 20, n = 3 and kı = лй = 
CWO) 3 In 10 = 30 In 2. Since 30 In 2 ~ 20.795 and the observed value of u is 20, и < kı, 
and we assign the sample to лэ or to Р(Х) or Mo, with 5 = Mel = 6102 and z = 
3052 = 3 In 2. Thus, 


Р(2|2, A) = Probability of making a correct classification decision 


п—1 


ӨЛ Г (n) 


kı 7 
= Pr{u < kj|Po(X)} = | e ^du 
0 


61n2 ,24—v 
= | dv, with r (3) = 2! = 2. 
0 Г (3) 
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Integrating by parts, 
J ve ’'du = —[v? + 2v + 2]e^". 


Then, 


1 61n2 1 
5 | ve "dv = – [202/2 + v + 11e ")6?? = 1 — gi (6n 27/2 + (61n2) + 1] 
0 


1 
№ 1 — salse x 0.785, and 


P(2|1, A) = Probability of misclassification 


kı 1 p3in2 no 
= ^ P\(X)dX = 5 | ve "dv = [50 + v le "|" 
0 0 


1 
-1- „1810 2) /2 + (31n2) + 1] © 0.485. 


Example 12.3.3. Let the two populations лі and л» be univariate normal with mean 
values ш and u2, respectively, and the same variance o?, that is, Р(х) : Ni (u1, 02) 
and P»(x) : Ni (u2, 62). Let the prior probabilities of drawing an observation from these 
populations be q1 = 1 апа 42 = І, respectively, and the costs ог loss involved with 
misclassification be C(1|2) = C(2|1). Determine the regions of misclassification and the 
corresponding probabilities of misclassification if (1): a single observation x is available; 
(2): iid observations x4, ..., Xn are available, from л] огл». 


Solution 12.3.31). If one observation is available, 
NA 
22 , 00 < x < СО, —oo < pj < 00, о >Q. 


1 
Р;(х) = zm 
Consider regions 

РО). Came 
P(x) CQIDqi 


—q-mee-x-6-ul., + 


1 2 2 
= ы = нл)” — x — m) | > 0. 
20 
Now, note that 
-[G — ш)? — (х — ио)? = 2х(ил — u2) - (ит 4) 209 

1002 — ид) 1 

x > 1—2 = (ш + u2) ог > m2 > 

20р — fp) 2 


1 1 
T Eo > ju + u2) and Аз: x < juu + m2). 
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The probabilities of misclassification are the following for k = 3Qu + ua): 


k 1 NOTE k— 
pa = f e зо dx = o(——"*) 


—oo ON 27t с 

оо 1 NE k— 
Рао = f КЕШЕ йх=1—Ф( =) 

k су2л oO 


where Ф (-) is the distribution function of a univariate standard normal density and k = 
3 + из). 


Solution 12.3.3(2). In this case, x1, ..., x, are iid and X’ = (x1,..., Xn). The multivari- 
ate densities are 


Lii G7) n G- uy) 


1 Qu heim? е 27 
P(X) = ————e x my 7L ——— l,a, 
| o" (/2х)" о" (A/23 ^ 
where x — Lx. Hence for ш > иэ, 
Р(Х) s eranl -G-u»l .. ү 
Р(Х) | e 


Taking logarithms and simplifying, we have 


fh oes d etus 2 
шр | — ш) – (х – џ2)12>2 0 = 


2 2 
= Hi Evi H^ 1 
Хх > = (ші + u2) for ui > u2 
2(41—H2) 2 
where 
д о? 
LES Ni (ui. Z), juin 
n 


Therefore the probabilities of misclassification are the following: 


k dc = 
Papa [^ a ag = 9 е 


—oo0 ON 27 20 


oo dr = 
P(12) = J Мп RE =1— a 8) 
к см 2л 20 


where k = (ил +42) and Ф (.) is the distribution function of a univariate standard normal 
random variable. 
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Example 12.3.4. Assume that no prior probabilities or costs are involved. A tuber crop 
called tapioca is planted by farmers. While farmer F; applies a standard fertilizer to the 
soil to enhance the growth of the tapioca plants, farmer Fz does not apply any fertilizer 
and let the plants grow naturally. At harvest time, a tapioca plant is pulled up with all 
its tubers attached to the bottom of the stem. The upper part of the stem is cut off and 
the lower part with its tubers is put out for sale. Tuber yield per plant, x, is measured 
by weighing the lower part of the stem with the tubers attached. It is known from past 
experience that x is normally distributed with mean value ш = 5 and variance o? = 1 
for F; type farms, that is, x ^ Ni(u4 = 5, c? = 1)|Е and that for Р type farms, 
x ^ Ni(uo = 3, o? — 1)|F5, the weights being measured in kilograms. A road-side 
vendor is selling tapioca and his collection is either from F; type farms or Р type farms, 
but not both. A customer picked (1): one stem with its tubers attached weighing 4.2 kg (2) 
a random sample of four stems respectively weighing 6, 4, 3 and 5 kg. To which type of 
farms will you classify the observations in (1) and (2)? 


Solution 12.3.4. (1). The decision is based on k — 3Qu + m2) = 105 + 3) = 4. In this 
case, the decision rule A = (A, Аз) is such that Aj : x > k and Ao : x < k for uj > po. 
Note that СН = k — pı = 4 — 5 = —1 and E22 = (4 — 3) = 1. As the observed x is 
4.2 > 4 = К, we classify x into P1(X) : № (m1, 1). Moreover, 


P(1|1, A) = Probability of making a correct classification decision 
оо e- 5-1 


= Pr{x > К|Рү(х)} = XO. 


А ee. Sais [ л 0.84 
= = (0.5 + X 82 0.84, 
-1 у Ол) о X (2л) 

апа 


P(1|2, A) = Probability of misclassification 


оо 6—5 (00—62)? оо a— 4u? 
= Pr{x > k|Pa(x)} = / ————dx = dx ғ 0.16. 
k | Zn) 1 XQ) 


Solution 12.3.4. (2). Inthiscase, X = 1(62-44-34-5) = 4.5, n = 4, x ~ N (ui, 1), i = 


(k-uı) __ = E (k—u2) __ = Е А ae 
1, 2, me em 2(4 — 5) = —2 and aie em 2(4 — 3) = 2. Since the observed x is 


4.5 > 4 = К, we assign the sample to Р(Х): №, 1), the criterion being Aj : x > k 
and A» : x « К. Additionally, 
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P(1|1, A) = Probability of a correct classification 


za- uu оо 73u? 
= Pr{x > k|P\(X —d 
rix > k|Pi(X)} = о" Ж о и 


2 uut 
=05+ | 5 о 
o X 


and 


P(1|2, A) — Probability of misclassification 
оо e- 3-12) оо e 2 
= Pr{x > k|Po(X)} = —— ——dx = | du 7 0.023. 

k (2л) 2 МСМОл) 
Example 12.3.5. Let лү and л» be two p-variate real nonsingular normal popula- 
tions sharing the same covariance matrix, лі : Мь(и%?, У), X > О, and m 
Nu, У), X > О, whose mean values are such that pw) z pw, Let the prior 
probabilities be gj = q2 and the cost functions be C(1|2) = C(2|1). Consider a single 
p-vector X to be classified into лү or л). Determine the regions of misclassification and 
the corresponding probabilities. 


Solution 12.3.5. The p-variate real normal densities are the following: 
В) = —, — aE Tocat (i 
(2л)2 |52 
fori = 1,2, X > О, и z и). Consider the inequality 


PX). СП2)42 _ 
PX) = ~ СО|)а _ 


e gly z qx-u40)-ax-uyO0yrx X-u .„ | 


Taking logarithms, we have 
—»5[(Х — иу xx = Ww) — x - wOY УХ — uO) > 05 
(uw = ЖО 5- ly _ (и D иу ET! (Thee, + u > 0. 


Let 
ip (u® — ЖО Y ix- lu 0). . ay x - tg 3s u”). (12.3.7) 
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Then, u has a univariate normal distribution since it is a linear function of the components 
of X, which is a p-variate normal. Thus, 


Var(u) = Var[(u® — wp)’ x^! x] 
= (u(0 — Ly x 1Cov(x) z ! (quy — u”) 
= (u — uY ET! (uh — u”) = A? (12.3.8) 


where A? is Mahalanobis’ distance. The mean values of u under л) and лә are respectively, 


Е(и)\л\ = (и — wOY ET Ego — iu — ну x 1 qu + а?) 


= (и — wOYE "(WO — u”) = 24°, (12.3.9) 
Е(и)[л2 = (и — Шу x14 — 5000 — ду ышы!) 
= (HW — uy x7 (uP — u”) = —24°, (12.3.10) 


so that 


и ~ Ni GA, Д?) under m, 


и ~ N\(—4A?, A?) under 72. (12.3.11) 
Accordingly, the regions of misclassification are 
Алти < Ori : u ~ Ni (GA, Д?) and Ар: u > Ofm : u ~ NCC TA, AP), (12.3.12) 


and the probabilities of misclassification are as follows: 


PQID [ 1 —з2@—249°4 
== e 2А и 
—oo Av 2л 
12 
A 1 
= / ———e 
-> om 
P2) | l -z2 “+34 
= e 2A и 
о Ау2л 


ee ОЕ 
= |10 =e Pdf = 1 — Ф(14) (iii) 
t24 ANDR 


where Ф (-) denotes the distribution function of a univariate standard normal variable. 


0. 


12 


24 = #(—44A) (її) 
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Note 12.3.1. If no conditions are imposed on the prior probabilities, qı and q2, or on the 

costs of misclassification, C(2|1) and C(1|2), then the regions are determined as A, : u > 

k, k = а CO T and A» : u < k. In this case, the probabilities of misclassification will 
142 

be Ф( k+ 5А 


k-iA? : 
A ) and | — Ф( ^ ), respectively. 


Note 12.3.2. If the prior probabilities qı and q2 are not known, we may assume that the 
two populations л and лә are equally likely to be chosen or equivalently that д = q2 = 
І, in which instance k = In СОП) Then, the correct decisions are to assign the vector 
X at hand to л in the region A, and to л» in the region Аз, where A; : и > k and 


A2:u <k, к= 1а z CDI with q1, g2, C(2|1) and C(1|2) assumed to be known and 


и = (aO — UOY ETX – iu - ШУУ (quo + ut) 


whose first term, namely (i, CO — nu Oy X-lX,is known as the linear discriminant function, 
which is utilized to discriminate or to separate two p-variate populations, not necessarily 
normally distributed, having mean value vectors u} and u® and sharing the same co- 
variance matrix X > О. 


Example 12.3.6. Assume that no prior probabilities or costs аге involved. Applicants 
to a certain training program are given tests to evaluate their aptitude for languages and 
aptitude for science. Let the test scores be denoted by x; and x2, respectively. Let X be 


the bivariate vector X = | | After completing the training program, their aptitudes 


are tested again. Let xO = fee’, 2j be the score vector in the group of success- 


ful trainees and let X2 = [х®, к] be the score vector in the group of unsuccessful 
trainees. From previous experience of conducting such tests over the years, it is known 
that ХО) ~ M(u®, x), X > O, and XO ~ No(u™, X), X > О, where 


4 2 2d MES 
D Q = aie 
йе ere Teal eee, a: 


Then (1): one applicant taken at random before the training program started obtained the 
4 

1 > 
started had the following scores: 


test scores Xo = (2): three applicants chosen at random before the training program 


ШО 


In (1), classify Хо to л or лә and in (2), classify the entire sample of three vectors into л 
or лә. 


726 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Solution 12.3.6. Let us compute certain quantities which are needed to answer the ques- 
tions: 


wren У-Ү -[ў 


2 = 
Шеш H (UO — Oy y"! x = p, 21x = 2х1 — 2x2; 


4% =u? ути - A = 


ЗЕ ЕЕ 
(и) — uy x- (ut? + и?) = [2,0] Ё 2 H i 


Hence, 


1 
и = (u = wy! yoly_ ; u — иу ET! (Thee. + pw) 
= 2х1 = 2х2 = 4; 
ul, ~ NIGA?, AD, ило  NiC- 5A, A”); 
Ay :u>0,Ar.:u < 0. 


Since, in (1), the observed Хо = | 


4 = 2 > 0 and we classify the observed Хо into zt, : N 1022, A’), the criterion being 
A, :u>Oand Аз: u < 0. Thus, 


|, the observed u is u = 2n -212-4 = 8-2- 


P(1|1, A) = Probability of making a correct classification decision 
e -zz (4-34 2 оо e 


ru = От) | Was * = | ost 


NS 


00 675 1 pU 
= dv = 0.5 + 2E Joe ~ 0.841; 
Í, у (2л) о у (2л) 


P(1|2, A) = Probability of misclassification 


1: Er ed Ѓ ES dS 
= = ————-dv RU. А 
o AVO Л Qm 


When solving (2), the entire sample is to be classified. Proceeding as in ће derivation of 
the criterion и in case (1), it is seen that for the problem at hand, Хо will be replaced by X, 


Classification Problems 727 


the average of the sample vectors or the sample mean value vector, and then и will become 
uy = 2х1 — 2X2 — 4 where X’ = [x1, х2]. Thus, we require the sample average: 


4) [3], [5]. [12 ОРОЮ E IA 
2 + 1 + |84 => observe заран са 4 |; 


This means that x; — 2 = 4, хо = 4, and the observed иј = 2x; — 2X2 —4 = 8—$—4 > 


0. Hence, we classify the whole sample to лі as the criterion is Ai :u; > 0and Аз: u] < 
0. Since X is normally distributed with E[X] — u® and Cov(X) = lx, i — 1,2, where 
n is the sample size, the densities of uw; under zr, and л» are the following: 


uii №064, 44°), n 23, 


142 142 
ило ~ Ni(—5 A", 34°). 
Moreover, 


P(1|1, A) = Probability of making a correct classification decision 


99 3 
= Pr{u, > Olm} = | _УЗ а-да а 
0 А/ 0л) 


12 


T. кш f. dr = 05 Г — dv © 0.958 
= v= v = 0.5 + v XU. 
Jan J- y3 VOT) o XQ) 


and 


P(1|2, A) = Probability of misclassification 
v3 eit ga? 


AJOn) us 


оо 
реја | 
(0) 


оо e 2" 
= zx 0.042. 
J з N (2r) 
12.4. Linear Discriminant Function 


Let X be a p x 1 vector and B a p x 1 arbitrary constant vector, В’ = (bj,..., bp). 
Consider the arbitrary linear function w = B'X. Then, the mean value and variance of w 
are the following: E(w) = B'E(X) and Var(w) = Var(B'X) = B'Cov(X)B = В'У В 
where X > О is the covariance matrix of X. Suppose that the X could be from a p-variate 
real population лу with mean value vector 4? or from the p-variate real population лә 
with mean value vector и). Suppose that both the populations лі and лз have the same 
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covariance matrix X > О. Then, a measure of discrimination or separation between л 
and m is |B/u(? — B'u® | as measured in terms of the standard deviation ./Var(w) for 
determining the best choice of B. Taking the squared distance, let 


[В'4® —B'uO P. [BuO -pP _ BUO — иуи — ув 
= B'XB i В'У В Е В'У В 

(12.4.1) 
since the square of a scalar quantity is the scalar quantity times its transpose, B'(u® — 
A. O^) being a scalar quantity. Accordingly, we will maximize 6 as specified in (12.4.1). 
This will be achieved by selecting a particular B in such a way that 6 attains a maximum 
which corresponds to the maximum distance between z and лә. Without any loss of 
generality, we may assume that B' X B — 1, so that only the numerator in (12.4.1) need be 


maximized, subject to the condition B'X B = 1. Let 4 denote a Lagrangian multiplier and 
n= Bw — uu? — wb) B — A(B'E B — 1). 


Let us take the partial derivative of with respect to the vector B and equate the result to 
a null vector (the reader may refer to Chap. 1 for the derivative of a scalar variable with 
respect to a vector variable): 


д 
= = 0 = 2(y 0 — uP) u” — uY B -25B = О 


> ZNO — pOu” — pOYB = АВ. (i) 


Note that (u® — п), В = о is a scalar quantity and B is a specific vector coming from 
(i) and hence we may write (i) as 


a - T 
В = -X ‘a? = и?) = с У My = и?) (ii) 


where c is a real scalar quantity. Observe that ô as given in (12.4.1) will remain the same 
if B is multiplied by any scalar quantity. Thus, we may take c — 1 in (ii) without any loss 
of generality. The linear discriminant function then becomes 


BX = (uw — py 5-Х, (12.4.2) 
and when B'X is as given in (12.4.2), 6 as defined in (12.4.1), can be expressed as follows: 
ш® — uy Du — pu — Oy E(w — pD) 
(WO = uy E- Vut) — pO) 
= (и — poy sg” — н) = A? = Mahalanobis’ distance 
= Var(w) = Variance of the discriminant function. (12.4.3) 


б 
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This ô is also the generalized squared distance between the vectors u® and u® or the 
squared distance between the vectors 2; -5 yD and 2; -2u in the mathematical sense 
(Euclidean distance). Hence Mahalanobis’ distance between two p-variate populations 
with different mean value vectors and the same covariance matrix is a measure of dis- 
crimination or separation between the populations, and the linear discriminant function is 
given in (12.4.2). Hence for an observed value X, if u = (и) — uOy x-!x > 0 when 
ш“), и and X are known, then we choose population л with mean value ш), and if 
и < 0, then we select population zr? with mean value и. When u = 0, both x1 and л 
are equally favored. 


Example 12.4.1. In a small township, there is only one grocery store. The town is laid 
out on the East and West sides of the sole main road. We will refer to the villagers as East- 
enders and West-enders. These townspeople shop only once a week for groceries. The 
grocery store owner found that the East-enders and West-enders have somewhat different 
buying habits. Consider the following items: x; = grain items in kilograms, x2 = vegetable 
items in kilograms, хз = dairy products in kilograms, and let [x1, хо, хз] = X’ where X is 
the vector of weekly purchases. Then, the expected quantities bought by the East-enders 
and West-enders are E(X) = u™® and E(X) = иб), respectively, with the common 
covariance matrix X > О. From past history, the grocery store owner determined that 


x] 2 1 3 0 0 
х= |0 |, ш = |3|, uO 2|3|], Z2|0O 2 -1 
x3 1 2 0 —1 1 


Consider the following situations: (1) A customer walked in and bought x; = 1 kg of grain 
items, х2 = 2kg of vegetable items, and хз = 1 kg of dairy products. Is she likely to be 
an East-ender or West-ender? (2): Another customer bought the three types of items in 
the quantities (10, 1, 1), respectively. Is she more likely to be an East-ender than a West- 
ender? 


Solution 12.4.1. The inverse of the covariance matrix, и?) — и), as well as other 
relevant quantities are the following: 


500 2 1 1 
E-a|0 1 1|, u-u =|3|-/3]=| 0|, 
012 1 2 —1 


(WO – uy x^ = [1,0, —1] 


© Ovie 
=. =. © 
l2— c 
Il 
mmm 
Wie 
| 
— 
| 
i 


730 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


In (1), X’ = (1, 2, 1) and since 


1 
(Б OE роо, 
1 


we classify this customer as a West-ender from her buying pattern. In (2), 


10 
(uw — Oy x-!x 2 (1, -1,-21] 1 | >й, 
1 


so that, given her purchases, this customer is classified as an East-ender. 
12.5. Classification When the Population Parameters are Unknown 


We now consider the classification problem involving two populations zr; and 7r? for 
which the parameters of the corresponding densities are unknown. Since the structure of 
the parameters in these general densities P; (X) апа Р(Х) is not known, we will present 
a specific example: Consider the two p-variate normal populations of Example 12.3.3. 
Let mı : Np(u, X) and лә: Мь(и?, 2), which share the same positive definite co- 
variance matrix 27. Suppose that we have a single observation vector X to be classified 
into л] or л2. When the parameters и“), и) апа X are unknown, we will have to es- 
timate them from some training samples. But, for a problem such as classifying skeletal 
remains, one does not have samples from the respective ancestral groups. Nevertheless, 
one can obtain training samples from living racial groups, and so, secure estimates of the 
parameters involved. Assume that we have simple random samples of sizes n; and n2 from 
Мь(и%®, X) and Np (u™, X), respectively. Denote the sample values by x Seek x 
and X\,..., ХО, and let Х © and X be the sample averages. That is, 


(1) (2) 
х = : Jo J=1,..., n1; xe >: Jo J=1,..., 7; 
(1) (2) 
Xpj Xpj 
zt -Q 
| a low. 20) E: o IA о 
zD f. xU. c d . 7Q_] - -D_E 
XS | ИНЕ = =) хр: хоер | icis (12.5.1) 
HU j=l 72) per 
р р 


Let ће sample matrices be denoted by bold-faced letters where the p x nı matrix Х and 
the p x nz matrix X® are the sample matrices and let X and X? be the matrices of 
sample means. Thus, we have 
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AP oon ain 
ni 
1 а), І : 
x® = [xj a : | 
(1) (1) 
Хр\ Хрп\ 
MU HU 
1 1 
X? Ep, E : | 
=(1) -0) 
Xp p 
т: 
E 
2 D 2 "um 
ХО = [xj E ANE | 
Q) 0) 
Xpp +++ Xpm 
X uu By” 
msnm т wx | (12.5.2) 
-2 -0 
Xp 


Then, the sample sum of products matrices are 
S; = (Х® — X9) KO — XO), i = 1,2; 
Sm = GU зу = De? — HP AO — 27), m= 1,2, $ = 51+ 9. (12.53) 


The unbiased estimators of и, ш) and X are respectively X®, ХФ) апа 2 = 


no) 
Ta, п(2) = nj +m — 2. The criteria for classification, the regions, the statistic, and so 


on, are available from Example 12.3.3. That is, 


C (112)q» 


Aj:u>k, Ao:u < К, k In ————, 
C 2|Dqi 


where 
y= XT — u”) = su D pry x-ig 0 3 u”). 
Note that qı and q» are the prior probabilities of selecting the populations л and лә and 


C(1|2) and C(2|1) are the costs or loss associated with misclassification. We will assume 
that q1, q2, C(1|2) and С(2|1) are all known but the parameters ш“), u” and X are 
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estimated by their unbiased estimators. Denoting the estimator of и as v, we obtain the 
following criterion, assuming that we have one p-vector X to be classified into 71 or ло: 


C(1]2 
Е ЕЕ cl 
qiC(2|1) 
р = по)Х'8 (ХО _ XO) _ nga (XO _ yOoys 4 xo 
= noyX — XO + x?rs-! XO — x), (12.5.4) 


As it turns out, it already proves quite challenging to obtain the exact distribution of v as 
given in (12.5.4) where X is a single p-vector either from л or from лә. 


12.5.1. Some asymptotic results 


Before considering asymptotic properties of и and v as defined in Sect. 12.4, let us 
recall certain results obtained in earlier chapters. Let the p x 1 vectors Y;, j =1,...,n, 
be iid vectors from some population for which E[Y;] = n and Cov(Y;) = 2 > О, j = 
l,...,n. Let the sample matrix, the matrix of sample means wherein the sample mean 
Y= 1 ра 2 1 Y; and the sample sum of products matrix S be the as follows: 


Y = [Y1,..., Yn], Y =[Y,...,ř], S = (sij), Sij = X Oir — 7i) O – Fj), 
k=1 
8 = [У - YJIY - Үү = Y[4 — J7'/n]Y, Y) = [ур Yap Уру], (i) 


where J is an x 1 vector of unities. Since a matrix of the form Y — Y is present, we 
may let и = О without any loss of generality in the following computations since Y; — 
Y = (Yj — и) — (Y — p). Note that В = B' = I, — 1JJ' = B? and hence, B is 
idempotent and of rank n — 1. Since B — B', there exists an orthonormal matrix Q such 
that Q'BQ = diag(1,...,1,0) = D, QQ'— I, Q'Q = І, the diagonal elements being 
1’s and 0 since B = B? and of rank n — 1. Then, 


S = Y Qdiag(1,...,1,0) Q'Y' = YODD'Q'Y', 

D = diag(l,...,1,0). (ii) 
Consider E-:8E-3, Let U; — 373Y; j =1,...,n, where Y; is the j-th column of Y 
and it is assumed that и = О. Observe that E[U;] = О, Cov(U;) = Ip, j = 1,..., п, 
and the U;'s are uncorrelated. Letting U = [U1, ..., Un], (ii) implies that 


ESS —-UQDDQ'U. (iii) 
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Denoting by U,j, the j-th row of U, it follows that the elements of (у) are iid uncorrelated 
real scalar variables with mean value zero and variance 1. Consider the transformation 
Vg) = Ос) О; then E[V(] = О and Cov[V(j5] = In, j = 1,..., р, the Vijy’s being 
the uncorrelated. Let V be the p x n matrix whose rows are Vj), j = 1,..., p. Let the 
columns of V be V;, j = 1,...,n, that is, V = [Vj,..., Va]. Then, (iii) implies the 
following: 


E-38E-? = VDD'V' = (V, ..., Ури, ..., V,]DY 
= [У1, ++, Ул—1› O][Vi, ++, Vii oy = Viv; + Phe + Va—1 V1 = 


aa ела A eee Ey Se Kae M 


E[S] = (п =) or [>] 55; (iv) 


Additionally, 


- 1 1 
Соу@) = -;CovIYi + ++- + Yn] = —ICov(i) + --- + Cov(¥n)] 


1 n X 
= 15 + +X] = 505 =— > О аѕп оо, (v) 

n n n 
when 27 is finite with respect to any norm of X, namely | Y|| < oo. Appealing to the 
extended Chebyshev inequality, this shows that the unbiased estimator of u, namely Y, 


converges to u in probability, that is, 


Pr(Y > и) > 1 whenn > ooor lim Pr(Y > ш) = 1. (vi) 
п> оо 


An unbiased estimator of X is È = Ea with E [5 ] = X. Will È also converge to 27 in 
probability when n — оо? In order to establish this, we require the covariance structure 
of the elements in S. For arbitrary populations, it is somewhat difficult to verify this result; 
however, it is rather straightforward for normal populations. We will examine this aspect 
next. 


12.5.2. Another method 


Let the p x 1 vectors Ху, j = l,...,n, be a simple random sample of size n from 
a population having a real №, (и, X), X > О, distribution. Letting S denote the sample 
sum of products matrix, S will be distributed as a Wishart matrix with m — n — 1 degrees 
of freedom and X > O as its parameter matrix, whose density is 
TO= ISI eis S>0,m> p; (i) 
22 |X |2Ip(5) 
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the reader may also refer to real matrix-variate gamma density discussed in Chap. 5. This is 
usually written as 5 ~ W,(m, X), E > O. Letting Sq) = X 28-3, Say ~ Wp(m, I). 
Consider the transformation S, = TT’ where T = (ti j) is a lower triangular matrix 
whose diagonal elements are positive, that is, t;; = 0, i < j, and tjj > 0, j = l,..., p. 
It was explained in Chaps. | and 3 that the 7;;’s are mutually independently distributed with 
the ¢;;’s such that i > j distributed as standard normal variables and HT as a chisquare 
variable having m — (j — 1) degrees of freedom. The j-th diagonal element of ТТ” is of 
the form H ааыа HN + A where 6, ~ 35 fork = 1,...,j — 1, that is, the square 
of a real standard normal variable. Thus, the j-th diagonal element is distributed as xt + 
+ Xr + x2 ==) ^ ху since all the individual chisquare variables are independently 
distributed, in which case the resulting number of degrees of freedom is the sum of the 
degrees of freedom of the chisquares. Now, noting that for a x2, 


E[x2] = v and Var(x2) = 2 v, (ii) 


the expected value of each of the diagonal elements in TT’, which are the diagonal ele- 
ments in S5, will be m = n — 1. The non-diagonal elements in TT’ result from a sum of 
terms of the form її, k < i, whose expected value is E[t;xt;;] = E[t;y] E[t;;]; but since 
E[tix] = 0, i > К, all the non-diagonal elements will have zero as their expected values. 
Accordingly, 


S 
E[$()] = diag(m, ..., m) > E| - 


Jars 8|2]- х, =, (iii) 


and the estimator 5 = 5, is unbiased for 27, т being equal to n — 1. Now, let us examine 
the covariance structure of Sœ). Let W denote a single vector comprising all the distinct 
elements of S(.) = TT’ and consider its covariance structure. In this vector of order 
PPID x І, convert all the original t;;’s and ¢;;’s in terms of standard normal and chisquare 
variables. Let z1,..., Z pun be the standard normal variables and у, ..., Yp denote the 


chisquare variables. Then, each element of Cov(W) = [W — E(W)][W — E(W)] will be 
a sum of terms of the type 


[Var(yx)][Var(z;)] = Var(yx) = [twice the number of degrees of freedom of ук], (iv) 


i 5 m ч Р А 1 1 
which happens to be a linear function of т. Our estimator being 27 = 5, = 2 Se y, 2, 


. 5, or { 
the covariance structure of n which is 1 Cov(W) tends to O when m — œ, since 


each element of Cov(W) is of the form am + b where a and b are real scalars, so that 
am tb — 0 аѕ m — оо, or equivalently, as n — оо since т = n — 1. Thus, it follows 
from an extended version of Chebyshev's inequality that 
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S 
Pr(= > 5) > Vasm > ocorasn > oo since m =n — 1. (v) 
m 


These last two results are stated next as a theorem. 


Theorem 12.5.1. Let the p x 1 vectors Ху, j = l,...,n, be iid with E[X;] = и апа 
Cov(X;) = X, j = l,...,n. Assume that X is finite in the sense that || X || < oo. Then, 
letting X — 1 Ds X ; denote the sample mean, 


Pr(X > и) > lasn- oo. (12.5.5) 


Further, letting X; ~ Np(u, X), X > О, 
А S ; 
Рг(Ё = — > £) > 1аз т > ооогазп > oo sincem =n — 1. (12.5.6) 
т 


Let us now examine the criterion in (12.5.4). In this case, we can obtain an asymptotic 
distribution of the criterion v for large nọ) or when no) — oo in the sense that n; — oo 
and n? — oo. When no) > oo, we have XY > pw, XO — u® and T — X, 80 
that the criterion v in (12.5.4) becomes 


u = (MO — uY ETX — uU – ШУУ UP + ш) 
= 1X = (a0 Epa qu – р), (12.5.7) 


which is nothing but u as specified in (12.3.7) with the densities N 10222, A’) in лу and 
№ (iA, A?) in лә. Hence, the following result: 


Theorem 12.5.2. When n, — оо and nz — oo, the criterion v provided in (12.5.4) 
becomes u as specified in (12.5.7) with the univariate normal densities MGA’, A?) in 
лі and Ni(-3A?, A2) in лә, where A? is Mahalanobis’ distance given in (12.3.8). We 
classify X, the observation vector at hand, to лу when X € A, and, to m2 when X € A» 
where А: u > k and Az: u < k with К = ln TOT qi and q being the prior 
probabilities of selecting the populations лі and m, respectively, and C (2|1) and C(1|2) 
denoting the costs or loss associated with misclassification. 


In a practical situation, when л and 7? are large, we may replace A? in Theorem 12.5.2 
by the corresponding sample value noy (X? — X ?)'S-! (X(0 — XO) where 5 = 51 + 52 
and по) = n; + n» — 2 and utilize the criterion u as specified in (12.5.7) to classify the 
given vector X into л and лә. It is assumed that q1, q2, C(2|1) and C(1|2) are available. 
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12.5.3. A new sample from л or 72 


As in Examples 12.3.1 and 12.3.2, suppose that a simple random sample of size n3 
is available either from лі : Np(uO, X) or from m : Ny(u, У), X > О. Letting 
the new sample be X ү”, ee, X Гу ‚ Ше p x пз sample matrix, the sample mean ХӘ = 


l у X; e , the p x пз matrix of sample means and the sample sum of products matrix 
are the followin: 


xO – bey XP] х – [х9 xo X9) 
СЕЧЕ moy na , » ДР: 3 
S3 = [x z XO [x = хү = б 


S = Dap a NG eg. (12.5.8) 


An unbiased estimate from this third sample is DES "i pas E [X ] = X. A pooled 
estimate of 27 obtained from the three samples is | 


Sy + So + 53 _ S 
ni +no+n3—3 no) 


‚ S = 51 + 52 + 83, ng) = ni +n + n3 — 3. (12.5.9) 


Then, the criterion corresponding to (12.3.4) changes to: 


c2 
coxa eme qup MR UD (12.5.10) 
CQOl|Dqi 


where 
ш = nay XO — 1x0 + xO9yrs-! ХО — x0) (12.5.11) 


with S = S1 +5253, ng) = ni-F no 4 n3 —3 and xo being the sample average from the 
third sample, which either comes from л] : Np (u D, X) Or m : Nu ®, У), X > О. 
Thus, the classification rule is the following: 


Cap 
dioe ende А =. (12.5.12) 
С(211) а 


ш being as defined in (12.5.11). That is, classify the new sample into zr; if ш > k and, into 
m2ifw < К. 


As was explained in Sect. 12.5.2, as п; — oo, j = 1,2, ХФ > и), i = 1,2, and 
although пз usually remains finite, as n; — оо and n2 — оо, we have ng) — оо and 
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i — X. Accordingly, the criterion w as given in (12.5.11) converges to шу for large 


values of nı and n5, where 
HSK =F ee X ш), (12.5.13) 


Compared to u as specified in (12.3.7), the only difference is that X associated with u 
is replaced by X? in шу. Hence, the variance in и will be multiplied by oe and the 
asymptotic distributions will be as follows: 
12 d. us 12 l2 
wil  M(z45, —A?) and wilm М( — 54°, —А?), (12.5.14) 
2 na 2 na 


аз nı — oo and n5 — oo. 


Theorem 12.5.3. Consider two populations лі : Npy(u D, X) and m : Np (iD); 
У > О, and simple random samples of respective sizes nj and n» from these two popula- 
tions. Suppose that a simple random sample of size пз is available, either from лі or m2. 
For classifying the third sample into тү or m, the criterion to be utilized is ш as given in 
(12.5.11). Then, the asymptotic distribution of w, when п; — oo, i = 1,2, is that of и 
specified in (12.5.13) and the regions of classification are as given in (12.5.12). 


In a practical situation, when the sample sizes n; and n» are large, one may replace 
A? by its sample analogue, and then use (12.5.14) to reach a decision. As it turns out, it 
proves quite difficult to derive the exact density of w. 


Example 12.5.1. А certain milk collection and distribution center collects and sells the 
milk supplied by local farmers to the community, the balance, if any, being dispatched to 
a nearby city. In that locality, there are two types of cows. Some farmers only keep Jersey 
cows and others, only Holstein cows. Samples of the same quantities of milk are taken 
and the following characteristics are evaluated: x1, the fat content, x2, the glucose content, 
and хз, the protein content. It is known that X’ = (x1, хо, хз) is normally distributed 
as X  Na(u(P, X), X > О, for Jersey cows, and X — N3(u™, X), X > О, for 
Holstein cows, with uw z jw, the covariance matrices X being assumed identical. 
These parameters which are not known, are estimated on the basis of 100 milk samples 
from Jersey cows and 102 samples from Holstein cows, all the samples being of equal 
volume. The following are the summarized data with our standard notations, where $, and 
S2 are the sample sums of products matrices: 


_ 2] 1 50 —50 50 150 —150 150 
xDz|1|,xO?z2]|2|],s;2|-50 100 0], $2|-150 300 0 
2 2 50 0 150 150 0 450 
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Three farmers just brought in their supply of milk and (1): a sample denoted by X is 
collected from the first farmer’s supply and evaluated; (2) a sample, X2, is taken from a 
second farmer’s supply and evaluated; (3) a set of 5 random samples are collected from a 
third farmer’s supply, the sample average being X. The data is 


1 2 
X,=]1], X= |1| and х= |2 |, п= 5. 
1 2 1 


Classify, X1, X» and the sample of size 5 to either coming from Jersey or Holstein cows. 


Solution 12.5.1. The following preliminary calculations are needed: 


S | 08$ S545. КЕ 
пі + по – 2 пі + пә – 2 200 1 0 3 І 
d ed 6. s2] . | J] | 3 
(55) =| 3 2-1|,x0-xO0-|-1l, x9. xO — |3 
00 -2 -1 1 0 4 
Then, 
| | — 6 3 = 
е = = =й 6 Xt bep 
"ETE 


о ТЕЛЕ x95 = 13 1,—1] | —4 
2 200 ge 4 : 


(X — ғоу( 2. 

200 
where the ш is given in (12.5.11). For answering (1), we substitute Х| to Х in ш. That 
15, w at Хү is 3(2) + (1) — (1) — 4 = 2 > 0. Hence, we assign X, to Jersey cows. 
For answering (2), we replace X in w by Хэ, that 15, 3(1) + (1) — (2) — 4 = —2 < 0. 
Thus, we assign Хә to Holstein cows. For answering (3), we replace X in ш by X. That 
is, 3(2) + (2) — (1) — 4 = 3 > 0. Accordingly, we classify this sample as coming from 
Jersey cows. 


—1 
) X = [3, 1, -1]X = 3x1 + x2 — x3 > w = 3x1 + X2 — X3 — 4 


12.6. Maximum Likelihood Method of Classification 


As before, let лу be the p-variate real normal population N pue. У), X > O, 
with the simple random sample X s Prud. i of size nı drawn from that population, 
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and лә: Np(u™, У), X > О, with the simple random sample X uus XD of size n2 
so distributed. A p-vector X at hand is to be classified into л or лә. Let the sample means 
and the sample sums of products matrices be ХО, XO. Sı and S5. Then, the problem 
of classification of X into л] or л? can be stated in terms of testing a hypothesis of the 
following type: X and х, er x are from Npy(u, X) and xe TES xc are from 
лэ constitutes the null hypothesis, versus, the alternative X and X (2), А o are from 
Ny(u, 27) and x oe xi? are from Мь(и%®, 27). Let the likelihood functions under 
the null and alternative hypotheses be denoted as Lo апа L1, respectively, where 
m e-30G-40Dy Е (у-и) уе 
= | П p ol | Ec 
j=j (2л)2|5;|2 (2л)2 |52 


п? gga ye е) 


– 500-0) z-l1ac- 40) 


<{T] r} 
jal (2л) |х? 
1 
il (iil а) ! 
Lo = (пу+лә+1)р njtna4l ? p ZV + (X — Ш ) X (X =H ), (i) 
(Ол) |X| 2 
1 
oo Qs-1 Q) i 
Li Pu (n115Tl)p nytngt+l ? p2 — V + (Х =. Hh 12 (X = I^ ), (її) 
(Ол) || 2 
where 
v —t(E 1$) + ; G0 = py oI — pM, 
n - Е 
+ (5715) + ao = цу XO 0) (iii) 
апа Sı and S» are the sample sums of products matrices from the samples X u^ OPEN d n 
and X P SIT i ү, respectively. Referring to Chaps. | and 3 for vector/matrix derivatives 


and the maximum likelihood estimators (MLE’s) of the parameters of normal populations, 
the MLE’s obtained from (i) are the following, denoting the estimators/estimates with a 
hat: The MLE’s under Lo are the following: 

zy (1) 


, ENG EC ERE = l; 


ni+1 ni+n+1 
D, y XO. Xv 
(D. | A) - oy = ( ЛХ T Y _ mi ) 
$4; = (Х Х = X= EX SS 
3 ( B X ш) dae EET 
2 2 2 
= (- + -) (Xx — Xx — XOy, (12.6.1) 
1 
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observing that the scalar quantity 
(X = APY ETX = a) — tr(X ЕРА ne sl a) 2. Оа Ну, 


By substituting ће MLE’s in Lo, we obtain the maximum of Lo: 


РА 
тах Го = (njtngt)lp a (nytng+l1) ? 
(л) 3 |M| 2 
Р E Si + S2 + GX = XD) (X nd xy Pin 
im пі +по +1 | © 
Under L1, the MLE's are 
= (2) (2) 
no +1 nit not] 
n 2 T T 
es (= s ;) Оу ху. ace.) 
Thus, 
Ө. шлш 
тах L| = (nj пә +1) A nytnotl? 
Qm) ра 
м ny ү? (2) yOv 
iuc PRO о ола 
Hence, 
max L унн?! я $ 
1 = X 0 _ 122] : — mem o. Pal so that 
max Li PT : | 21 
$i 52 + (PEUX -ROX — OY ed 
m E _ . .6. 


If z; > 1, then max Lo > max Lj, which means that the likelihood of X coming from лі 
is greater than or equal to the likelihood of X originating from лә. Hence, we may classify 
X шолу if z; > 1 and classify X into лә if zı < 1. In other words, 


Ai:z > 1 and Аз: zi <1. (iv) 
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If we let S = S1 + So, then zı > 1 = 


2 m E 
- Bet (x — OX — Xy. (v) 


We can re-express this last inequality in a more convenient form. Expanding the following 
partitioned determinant in two different ways, we have the following, where S is p x p 
and Y is рх 1: 
| : E | = |S + YY'| = |S] |1 + Y's 'Y] 
= |8101 + Y'S! Y], (vi) 


observing that 1 + Y’S~'Y is a scalar quantity. Accordingly, zı > 1 means that 


2 И _ 2 z _ 
1+ (2) ax - xOys-tax - x9) 2 1 (——) x - Х®уз-!(х — x0». 
п +1 nil 
That is, 
n 2 E = 
n= (——) (X — XV sly = x) 
n2 + 1 
LY Pr 3*6. eth He R0 
(x—XQ0)s-tx-Xx0?0)-0 3 
n+l 
n y = (2) ( 5 ү = (2) 
= X-X — X—X 
ч (от ( ) пі + по = 2 ( ) 
- ( = ya - xy (—* yx ES ex (12.6.5) 
п +1 пі + пә – 2 Е 
Hence, the regions of classification are the following: 
Aj1:z3 > 0 and А» : 23 < 0. (vil) 
Thus, classify X into лу when z3 > 0 and, X into л» when z3 < 0. For large n, and n», 
some interesting results ensue. When n; — оо and n2 — оо, we have Ет > 1, = 
1,2, ХО + 4G, i = 1,2, and So — X. Then, z3 converges to z4 where 


z4 -ix-uOyr (x-40)-(x-u4O0yr (x-àu0)-0 (ий) 
2 [X – (uU + WO WENO -u9)202 uz0 


where u is the same criterion u as that specified in (12.5.7). Hence, we have the following 
result: 
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Theorem 12.6.1. Let к, 5 хи be a simple random sample of size nj from лі : 
Мь(и%®, У), X > O and x. das X be a simple random sample of size n» from the 
population m : №, (и, X), X > О. Letting X be a vector at hand to be classified into 
лі or T2, when n, — oo and n» — œ, the likelihood ratio criterion for classification 
is the following: Classify X into лу Йи > О and, X into m ўи < О or equivalently, 
Aj1:u > Oand A» : и < 0 where u = [X — (we) + uO) ET! (uP — 40) whose 
density is и ^ Ni GA, A?) when X is assigned to лу andu ~ Ni CIA, A?) when 
X is assigned to m, with A? = (u(0 — wy x-!(u(P — pw) denoting Mahalanobis’ 
distance. 


The likelihood ratio criterion for classification specified in (12.6.5) can also be given 
the following interpretation: For large values of n; and n», the criterion reduces to 
the following: (X — uO)y'z-!(x — Ш) — (x — uy x-!(x — uU) > 0 where 
(X — uOy x-!(x — и) is the generalized distance between X and и, and (X — 
p Oy x-!(x — w) is the generalized distance between X and ш“), which means that 
the generalized distance between X and j is larger than the generalized distance be- 
tween X and yw“) when и > 0. That is, X is closer to u® than u® and accordingly, 
we classify X into лу, which is the case и > 0. Similarly, if X is closer to ш) when 
compared to the distance to j,(U, we assign X to лә, which is the case и < 0. The case 
и = 0 15 also included in the first inequality, but only for convenience. However, when 
Pr{u = 0|nj, i = 1,2} = 0, replacing и > 0 by u > 0 is fully justified. 


Note 12.6.1. The reader may refer to Example 12.3.3 for an illustration of the compu- 
tations involved in connection with the probabilities of misclassification. For large values 
of nı and n5, one has the z4 of (viii) as an approximation to the и appearing in the same 
equation as well as the u of (12.5.7) or that of Example 12.3.3. In order to apply Theo- 
rem 12.6.1, one needs to know the parameters ш“), u” апа X. When they are not avail- 
able, one may substitute to them the corresponding estimates XU, X® and Š= tse, 
when n and nz are large. Then, the approximate probabilities of misclassification can be 
determined. 


Example 12.6.1. Redo the problem considered in Example 12.5.1 by making use of the 
maximum likelihood procedure. 


Solution 12.6.1. In order to answer the questions, we need to compute 
) x - y к ше 
пі + пә – 1 


TED a) жыл, 
ntl пі + пә – 2 


n —1 = 
- x- x2 
$4 С +1 ) ( ) 
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nj _ 100 д, n) _ 102 д, . . . 
> m41 = 101 ~ 1 and = 103 ^ 1 and hence, the criterion z4 is the same 


In this case cl 
as w of (12.5.4) and the decisions arrived at an Example 12.5.1 will remain unchanged in 
this example. Since nı and n? are large, we have reasonably accurate approximations of 


the parameters as 


XO, pO, XO 5 10) and ӘСЕ 5, 
: пі + по – 2 
so that the probabilities of misclassification сап be evaluated by using their estimates. The 
approximate distributions are then given by 
шіл ~ NGA’, А?) and wim ~ №(-14°, Л?) 


where Д2 = (Xx — XO e) 00 — XQ), From the computations done in 
Example 12.5.1, we have 


(l vQO)v v(1 v (2)\/ 
(X — xOy = n,—1,0], (X0 — xO) л = ep nr 
1 
AP = [3,1,—1]|—1|=2 
0 


= ш|лу ^ М\(1,2) апа шло ~ Nı(—1, 2), approximately. 


As well, Ay : w > 0 and A»: ш < 0. For the data pertaining to (1) of Example 12.5.1, 
we have ш > 0 and Х| is assigned to л. Observing that w — и of (12.5.7), 


P(1|1, A) = Probability of arriving at a correct decision 


оо 
1 
= Руи > Olm} = | gH gy, 
о СМ2/Ол) 
J MEET отв 
= e v 7 U./6; 
© у (2л) 
P(1|2, А) = Probability of misclassification 
оо 
1 
= Руши > 0|m} = | gH gy 
о ^24/Qm) 
J еар л 024 
= e ) xU. š 
4 У (л) 


In Example 12.5.1, the observed vector provided for (2) is classified into ло since ш < 0. 
Thus, the probability of making the right decision is P(2|2, A) = Руи < 0|л2} ~ 0.76 
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and the probability of misclassification is P(2|1, A) = Pr{u < O|m,} ~ 0.24. Given the 
data related to (3) of Example 12.5.1, the only difference is that the distributions in л 
and m2 will be slightly different, the mean values remaining the same but the variance А2 
being replaced by А2 /n where n = 5. The computations are similar to those provided for 
(1), the sample mean being assigned to лу in this case. 


12.7. Classification Involving k Populations 


Consider the p-variate populations лу, ..., лк and let X be a p-vector at hand to be 
classified into one of these k populations. Let q1, ... , gx be the prior probabilities of select- 
ing these populations, gj > 0, j =1,...,k, with q1 +++: + qx = 1. Let the cost of mis- 
classification of a p-vector belonging to л; being improperly classified into л; be С( |і) 
fori Æ j so that С(ї|ї) = 0, i = 1,..., k. A decision rule A = (A,,..., Ар) determines 
subspaces A; C Rp, j = 1,...,k, with А; ПА; = ф (the empty set) for alli # j. Let the 
probability/density functions associated with the k populations be Р;(Х), j = 1,..., К, 
respectively. Let P(j|i, A) = Pr(X € А л; : Р;(Х), A} = probability of an observation 
coming from or belonging to the population л; or originating from the probability/density 
function P; (X), being improperly assigned to л; ог misclassified as coming from P;(X), 
and the cost associated with this misclassification be denoted by C(j|i). Under the rule 
A = (Aj,..., Aj), the probabilities of correctly classifying and misclassifying an ob- 
served vector are the following, assuming that the P;(X Ys, j =1,...,k, are densities: 


Pli, у= |] P,(X)dX and P(jli, EI Р(Х)аХ, у=. © 


А; Aj 
where P'(i|i, A) is a probability of achieving a correct classification, that is, of assigning 
an observation X to л; when the population is actually z;, and P(j|i, A) is the probability 
of an observation X coming from л; being misclassified as originating from л}. Consider 
a p-vector X at hand. What is then the probability that this X came from P;(X), given 
that X is an observation vector from one of the populations лі, ..., лк? This is in fact a 
conditional statement involving 
qi P(X) 

qi PY(X) + qo PX(X) +-+- + Р(Х). 
Suppose that for specific i and j, the conditional probability 

qi Pi(X) E: qj Pj(X) 

qi P(X) b qe P(X) © gi PAOO b i + РАСХ) 

This is tantamount to presuming that the likeliness of X originating from P; (X) is greater 
than or equal to that of X coming from P;(X). In this case, we would like to assign X to 


(i) 
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л; rather than л}. If (ii) holds for all j = 1,...,k, j z i, then we classify X into л. 
Equation (ii) for j = 1,..., k, j # i, implies that 
gi P(X) qb 5 ВОО 5 терү (12.7.1) 
P(X) qi 


Accordingly, we adopt (12.7.1) as a decision rule A = (A1, ..., Ag). This decision rule 
corresponds to the following: When X € A; C Rp or X falls in Aj, then X is clas- 
sified into л, when X є A», then X is assigned to л», and so on. What is the ex- 
pected cost of an X belonging to л; being misclassified into л; under some decision rule 
В —(Bi...,By), B; C Ry, j = l...,k, Bin Bj = О, i № j, foralli and j? This is 
qi Pj(X)C(j|i) = E;(B). The expected cost of an X belonging to лт; being misclassified 
into л; under the same decision rule B is E;(B) = qj Pj(X)C(i|j). If Ej(B) < Ej(B), 
then we favor P; (X) over P;(X) as it is always desirable to minimize the expected cost in 
any procedure or decision. If Е;(В) < Е;(В) forall j = 1,..., k, j Ai, then P;(X) or 
л; is preferred over all other populations to which X could be assigned. Note that 


P(X)  qjC(ilj) 


Ej(B) < Ej(B) = qi (X)C(Cj|i) < q; Pj(X)CG|]) > ——, (йй 
i di: Р(Х) qi Cli) 
for j = l,...,k, j Z i, so that (iii) is the situation resulting from the following misclas- 
sification rule: if Boy au 
сеа (1275) 


Р(Х) qiCGl) ^ — 

we classify X into л; or equivalently, X є Aj, which is the decision rule A = 
(A1, ..., Ак). Thus, the decision rule B in (iii) is identical to A. Observing that when 
C(i|j) = C(jli), (12.7.2) reduces to (12.7.1); the decision rule A = (A1,..., Ax) in 
(12.7.1) is seen to yield the maximum probability of assigning an observation X at hand 
to л; compared to the probability of assigning X to any other z;;, j = 1,...,k, j zi, 
when the costs of misclassification are equal. As well, it follows from (12.7.2) that the 
decision rule A = (A1,..., Aj) gives the minimum expected cost associated with as- 
signing the observation X at hand to л; compared to assigning X to any other population 
Hu j=1,...,k, j Éi. 
12.7.1. Classification when the populations are real Gaussian 

Let the populations be p-variate real normal, that is, л; ~ Np (u LO ) 5 0, j= 
1,..., k, with different mean value vectors but the same covariance matrix X > О. Let 


the density of л; be denoted by Р;(Х) = N»(u, У), X > О. A vector X at hand is to 
be assigned to one of the z;’s,i = 1,..., k. In Sect. 12.3 or Example 12.3.3, the decision 
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rule involves two populations. Letting the two populations be л; : P;(X) and x; : Pj(X) 
for specific i and /, it was determined that the decision rule consists of classifying X into 
mi if In рО > Inp, p = аи with p = 1 so that In p = 0 whenever C(i|j) = C(jli) 
and q; = qj. When In p = 0, we have seen that the decision rule is to classify the p-vector 
X шол; or P;(X) if ujj(X) = О and to assign X to Р(Х) оглу; if ujj(X) < 0, where 


uij (X) = (WO — PY ETX – $49 — иу» UP + Ш) 
-[X - $49 + ш) qu – 00). (iv) 


Now, on applying the result obtained in (iv) to (12.7.1) and (12.7.2), one arrives at the 
following decision rule: 
(Cj 
Aj: ujj(X) > 0or A; : uij (X) > lnk, k= 4С) Ј Ec J Ё і, (12.7.3) 
qi C(j li) 


with In p = 0 occurring when q; = д; and C (i| j) = C (jli). 


Note 12.7.1. What will interchanging i and j in Uij (X ) entail? Note that, as defined, 
uj; (X) involves the terms (wu — и?) = —(и©) — ш) and (иб + uO’), the latter 
being unaffected by the interchange of u® and ww“). Hence, for all i and j, 


When the underlying population is X ~ Np (и, X), Е[и:;(Х)|л:] = 242, which im- 

plies that Е[иң|л] = —5A2, = —Elui; X)|z;] where A2, = (uw — OY x^! qu — 
G) 

py’). 


Note 12.7.2. For computing the probabilities of correctly classifying and misclassify- 
ing an observed vector, certain assumptions regarding the distributions associated with 
the populations ту, j = 1,...,k, are needed, the normality assumption being the most 
convenient one. 


Example 12.7.1. A certain milk collection and distribution center collects and sells the 
milk supplied by local farmers to the community, the balance, if any, being dispatched to 
a nearby city. In that locality, there are three dairy cattle breeds, namely, Jersey, Holstein 
and Guernsey, and each farmer only keeps one type of cows. Samples are taken and the 
following characteristics are evaluated in grams рег liter: ху, the fat content, x2, the glu- 
cose content, and x3, the protein content. It has been determined that X' = (x1, x2, x3) 
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is normally distributed as Х ~ Мз (и, X) for Jersey cows, X ~ N3 (и), 27) for Hol- 
stein cows and X ~ N3(u®?, X) for Guernsey cows, with a common covariance matrix 
X > О, where 


2 1 2 3 0 0 
mo = БЕД ио) = |3 |, ue =|3| аа 22/|0 2 —1 
1 2 3 0 —1 1 


(1): A farmer brought in his supply of milk from which one liter was collected. The three 
variables were evaluated, the result being X;, = (2, 3, 4). (2): Another one liter sample was 
taken from a second farmer's supply and it was determined that the vector of the resulting 
measurements was X, = (2, 2, 2). No prior probabilities or costs are involved. Which 
breed of dairy cattle is each of these farmers likely to own? 


Solution 12.7.1. Our criterion is based on u;; (X) where 
ov jo ae 
uij (X) = (HO — uy ETX — WO – ШУХ qu + и), 


Let us evaluate the various quantities of interest: 


1 0 -1 
и-и = | of, yD-,0O2]| of, „2-39 - | ol, 
-1 -2 -1 
3 4 3 
uU 4 pO = 165, pO +00 6l, uy? +00 = |6; 
3 4 5 
100 
—1 
ege s 
012 


Ат: {uj2(X) > 0, uis (X) > 0}, A2 : {u21(X) = 0, u23(X) = 0}, 
Аз : {из1(Х) 2 0, u32(X) = 0}; 


(u — uY тр ha k= G, —1,—2)Х = in — x2 — 2x3 
(ш — и®уз-!Х = (0,0, 2) £7! X = (0, —2, —4)X = —2x2 — 4x3 
(WO — u®Y ETX = (-1,0, 2-1) z^! x = (-1, -1, —2)X = — 1x1 — x2 — 2x3; 
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2009 — pOYE U +u”) = 313, -1, -2] Jm m 
3 
4 
1000 – пух ш + р) = 510, -2, —4]] 6 | = —14 
4 
; 17 
gu — ШУ qu +u”) = 51-3, -1, -2] | 6 | = =. 
5 


Hence, 


ир(Х) = 4x1 — x2 — 2x3 + 5; uis (X) = —2хә — 4x3 + 14; 
и21(Х) = -ix + x2 + 2х3 — ы; u23(X) = -ix — x2 — 2x3 + H, 
из1(Х) = 2x2 + 4x3 — 14; u32(X) = 1х + x2 + 2x3 — D. 


In order to answer (1), we substitute Хо to X and first, evaluate иу (Хо) and u13(X0) to 
determine whether they are > 0. Since u12(X0) = 10) —(3)—2(4)+ 4 < 0, the condition 
is violated and hence we need not check for и1з(Хо) > 0. Thus, Хо is not in Ај. Now, 
consider u21 (Xo) = —3(2)+3+2(4)—4 > Oand w23(Xo) = —3(2)-(3)-2(4)-¥ < 0; 
again the condition is violated and we deduce that Хо is not in A». Finally, we verify Аз: 
из (Хо) = 2(3) + 2(4) — 14 = 0 and u32(X0) = $(2) + (3) + 2(4) — У > 0. Thus, 
Хо € As, that is, we conclude that the sample milk came from Guernsey cows. 


For answering (2), we substitute X; to X in u;;(X). Noting that uj2(X1) = iQ) — 
(2) —2(2) + п > O and uj3(X1) = —2(2) —4(2) +14 > 0, ме сап surmise that Х| € A1, 
that is, the sample milk came from Jersey cows. Let us verify A» and A3 to ascertain that 
no mistake has been made in the calculations. Since u?1(X4) < 0, X, is not in А», and 
since u31(Xo) < 0, X, is not in Аз. This completes the computations. 


12.7.2. Some distributional aspects 


For computing the probabilities of correctly classifying and misclassifying an observa- 
tion, we require the distributions of our criterion u;; (X). Let the populations be normally 
distributed, that is, 7; ~ Np (uË ) XX ), X > О, with the same covariance matrix X for 
all k populations, j = 1,...,k. Then, the probability of achieving a correct classification 
when X is assigned to л; is the following under the decision rule A = (Aj, ..., Ax): 


P(ili, A) = J Р(Х)ах (12.7.5) 
A; 


1 
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where dX = dx; л... ^ dx, and the integral is actually a multiple integral. But А; is 
defined by the inequalities uj1(X) > 0, uj2(X) > 0,..., ujg(X) > 0, where uj; (X) is 
excluded. This is the case when no prior probabilities and costs are involved or when the 


prior probabilities are equal and the cost functions are identical. Otherwise, the region is 
(A; : ug (X) > Inky, kj = ЧЕ, j = 1,....k, j # i}. Integrating (12.7.5) is 
challenging as the region is determined by k — 1 inequalities. 


When the parameters шб), j = 1,...,k, and X are known, we сап evaluate the 
joint distributions of u;;j(X), j = 1,...,k, j + i, under the normality assumption for 
лу, j = 1,...,k. Let us examine the distributions of u;;(X) for normally distributed 


лі: P;}(X),i =1,...,k. In this instance, Е[Х]|л; = и, and under Ti, 


E[u; Xlr; = (49 — wPY E pO — 5000 — uy EUO + Ш?) 
— 10и = uy ET (uÀ — pw) - LH 
Var(ujj(X)) lar; = Vara? — pO x7! x] = Аў. 


Since и;;(Х) is a linear function of the vector normal variable X, it is normal and the 
distribution of u;; (X)|z; is 

ui; (X) ~ NGA; А2), j=l,...,k, ] #1. (127.6) 
This normality holds for each j, j = 1,...,k, j z i, and for a fixed i. Then, we can 
evaluate the joint density of uj1(X), ui2(X), ..., uj (X), excluding uj; (X), and we can 
evaluate P(i|i, A) from this joint density. Observe that for j = 1,...,k, j z i, the 
uj; (X)'s are linear functions of the same vector normal variable X and hence, they have a 
joint normal distribution. In that case, the mean value vector is a (k — 1)-vector, denoted 
by рог), whose elements are кА н jJ=1,...,k, j Z i, for a fixed i, or equivalently, 


Hap = G47 -o 5441 = ЕП] with Uj; = [ui (X), ..., win XDI, 


excluding the elements и;;(Х) and АЎ = 0. The subscript ii in U;; indicates the re- 
gion A; and the original population P;(X). The covariance matrix of U;;, denoted by 27;;, 
will be a (k — 1) x (k — 1) matrix of the form 27; = [Соу(и;», Uit)] = (Crt), ej = 
Cov (uj, (X), uj; (X)). The subscript ii in Xj; indicates the region A; and the original pop- 
ulation P; (X). Observe that for two linear functions ti = C'X = сіх +--+ CpXp and 
to = ВХ = х T: bpXp, having а common covariance matrix Cov(X) = X, we 
have Var(t;) = C' XC, Var(t2) = B'X В and Cov(tj, t2) = C'E B = B' XC. Therefore, 


сы = (u = wy op _ и), i Ær,t;, Xii = (Crt). 
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Let the vector U;; be such that (2, = (ui1(X), ..., uig(X)), excluding u;;(X). Thus, for a 
specific i, 
Uii ^ Nk-1 (hii), Xii), Xii > О, 
and its density function, denoted by g;; (Uj;), 15 
1 —1(U;;— wy) E; Ui in) 
gii(Uii) = Fel гё 3 Vii — Hii) ii ii — (ii) : 
(2л) 2 |Xii|? 


Then, 


P(ili, A) = | gii (Uii)dU;;i 
uij(X)0, j=1,...,k, ji 


оо оо 
= | -f gii (Uii )dui1 (X) ^ ... ^ duik(X), (12.7.7) 
uii (X)=0 uik(X)=0 


the differential du;; being absent from dU;;, which is also the case for u;;(X) > O in the 
integral. If prior probabilities and cost functions are involved, then replace u;;(X) > 0 in 


the integral (12.7.7) by uj; (X) = Inkjj, kij = ene Thus, the problem reduces to de- 
termining the joint density g;;(U;;) and then evaluating the multiple integrals appearing in 


(12.7.7). In order to compute the probability specified in (12.7.7), we standardize the nor- 
1 


mal density by letting Vi; = а where Vi; ^ №-1(0, I), and with the help of this 
standard normal, we may compute this probability through V;;. Note that (12.7.7) holds 
for each i, і = 1,..., К, and thus, the probabilities of achieving a correct classification, 
P (i|i, A) fori = 1,..., Е, are available from (12.7.7). 


For computing probabilities of misclassification of the type P (i|j, A), we can proceed 
as follows: In this context, the basic population is л; : Pj(X) ~ N pci, Aj), the 
region of integration being А; : {u;1 (X) > 0,..., uix (X) > 0}, excluding the element 
ujj(X) = 0. Consider the vector U;; corresponding to the vector U;;. In Uj;, i stands for 
the region A; and j, for the original population P; (X). The elements of U;; are the same 
as those of U;;, that is, Ui; = (ui1(X), ..., Uix(X)), excluding uj; (X). We then proceed as 
before and compute the covariance matrix 27;; of Uj; in the original population P; (X). The 
variances of ujs (X), m = 1,..., k, m Æ i, will remain the same but the covariances will 
be different since they depend on the mean values. Thus, U;; ^ №1 (шу), Xij), and on 
standardizing, one has Vi; ^ №_1(0, I), so that the required probability P(i|j, A) can 
be computed from the elements of V;;. Note that when the prior probabilities and costs are 
equal, 
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P(ilj, ay= | gij Uij) dui(X) A... ^ dui(X) 
uj, (X)20,...,ujg(X)20 


оо оо 

= | -f gij (Шу) diss (12.7.8) 
uii (X)—0 uik(X)=0 

excluding u;; (X) in the integral as well as the differential du;; (X). Thus, dU;; = йи; (Х)л^ 

... A dujx(X), excluding duj; (X). 


Example 12.7.2. Given the data provided in Example 12.7.1, what is the probability of 
correctly assigning X to x1? That is, compute the probability P(1|1, A). 


Solution 12.7.2. Observe that the joint density of u15 (X) and u13(X) is that of a bivariate 
normal distribution since uj? (X) and u43(X) are linear functions of the same vector X 
where X has a multivariate normal distribution. In order to compute the joint bivariate 
normal density, we need E[uj;(X)], Var(ui;(X)), j = 2,3 and Cov(uj2(X), u13(X)). 
The following quantities are evaluated from the data given in Example 12.7.1: 


Var(u12(X)) = Var[(u? — wy x^! (и — pOY x qu? + 409) 
= Var (a = uy aX] = (uU! — рш =p) 


1 
| 7 7 
=[3,—1,—2]| 0 | =z > Е[и(Х)] =; 
_1 3 6 
Var(u13(X)) = (и) — OYE qu) – 405) 
0 
= [0, —2, —4] | 0 | = 8 = E[uis(X)] = 4; 
—2 
0 
Соу(и12(Х), u13(X)) = (и — uy Sw — u®) = [3,-1,-2]] 0 | =4. 
—2 
: Р _ [ui (X) З НОМЕ 
Hence, the covariance matrix of U11 = s denoted by 271, is the following: 
13 
14 
Уп = li E > |Xul= > 
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where 
iL FEA IBS urne 
B=- 1246 Ed > |B| = |В'| = |Zu]? = 5 
The bivariate normal density of U1, is the following: 
Ui = К "NS Gy, X11), Ba) = И ; (12.7.9) 


with 21; and + = = B'B as previously specified. Letting Y = B(Ujj — E[U11]), Y ~ 
N5(O, I). Note that 


B Й ЕИ be = ER = p = j | 


u13(X) — Е[и1з(Х)] u13(X) — 
24/6 V6 
LA Gigs (TO = a ey, 
yı WALA )— 7/6) 60136 ) = 4) 
/6 
е (шы Жуй}, 
y2 Jg Ust )— 4) 


Then, 
V8 [1 Мб dE WA 
Ё | BE. | 


[ - 2 a E e р 

u13(X) – 4 0 242 |121] 

which yields и12(Х) = Z + 4 yı + М? y» and и1з(Х) = 4 + 2/2 y2. The intersection 
of the two lines corresponding to u12(X) = 0 and uj3(X) = 0 is the point (y1, yj) = = 
(V3), — v2). Thus, u12(X) > 0 and u13(X) > 0 give y» > -35 = —42 and 2 + 
5 yi + V2 y2 > 0. We can express the resulting probability as ру — o» where 


and we have 


р = ri Ae 5-е 201+) фуу A dyz = 1 — &(—V2), (12.7.10) 
y--42 y1—-—09 
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which is explicitly available, where ®(-) denotes the distribution function of a standard 
normal variable, and 


ABR) Gy) 
02 =| di ^ x? “207 ау A dy, 
y 


]—-— 09V == 


1=—00 


V3(2) 13 
= / ase 2190—7566 + JD) — $(—42)]dyi 
y 
1 


(=) 1 
" / Juss Pec + Ау) ау – Ф030) (2). 027110 


y1—-—900 


Therefore, the required probability is 


pi р = 1 — &(—V2) + 6 (-/2)8 (V3(2)) 


J/3( 2) i cds 

РЕ Ол)” 271 $(--4 + ay) dy. (12.7.12) 
Note that all quantities, except the integral, are explicitly available from standard normal 
tables. The integral part can be read from a bivariate normal table. If a bivariate normal ta- 
ble is used, then one can approximate the required probability from (12.7.9). Alternatively, 
once evaluated numerically, the integral is found to be equal to 0.2182 which subtracted 
from 0.9941, yields a probability of 0.7759 for P(1|1, A). 


12.7.3. Classification when the population parameters are unknown 


When training samples are available from the populations л;, i = 1,...,k, we сап 
estimate the parameters and proceed with the classification. Let x }) =1,...,т, bea 
er random sample of size n; from the i-th population z;. Then, the sample average 

XY = E 2 a and with our usual notations, the sample matrix, the matrix of 
Е means and sample sum of products matrix are the following: 


х® = [X®,..., x), Х® —=[Х®,...‚Х®], 
5 = [Х® – ХОХ — xv, i=1,...,k, 


(i) (i) 
Ху = К ni | . | 
xà Z : Mi А Si = (5), sO = 2 ат х0)(х0 — x85, i= 1, T" k. 


i uua 
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Note that X? and X are p xn; matrices and x? isa px 1 vector foreach j = 1,...,п;, 
and i = 1,..., К. Let the population mean value vectors and the common covari- 
ance matrix be и“), mE u®, and X > О, respectively. Then, the unbiased estima- 
tors for these parameters are the following, identifying the estimators/estimates by a hat: 
p = XO. ij =1,...,k, and È = a ae S = Sı +--+ + Sg. On replacing the 
population parameters by their unbiased estimators, the classification criteria u;j(X), j = 


1,...,k, j # i, become the following: Classify an observation vector X into л; if 
jj (X) > у, kij = COE): j=l,....k, jzüoüjz0 j=1,...,k, jzüdf 
qd1—-::: = qk, and the C(i|j)'s are equal j = 1,...,k, j z i, where 


£j (X) = (X0 — XOY EZ! x — 1(xO — XOY £ x9 + XD) (12.7.13) 


for j = 1,..., k, j z i. Unfortunately, the exact distribution of й; j X) is difficult to 
obtain even when the populations z;'s have p-variate normal distributions. However, when 
n; > oo, X — pW, j = 1,..., Е, and when nj; > oo, j = 1,...,k, È>. 
Then, asymptotically, that is, when n; — oo, j = 1,...,k, Uj; (X) — ujj(X), so 
that the theory discussed in the previous sections is applicable. As well, the classification 
probabilities can then be evaluated as illustrated in Example 12.7.2. 


12.8. The Maximum Likelihood Method when the Population Covariances Are 
Equal 


Consider k real normal populations л; : Р(Х) = Npy(u, У), X > 0, і = 
1,...,k, having the same covariance matrix but different mean value vectors и, і = 
1,...,k. A p-vector X at hand is to be classified into one of these populations z;, j = 
1,...,k. Consider a simple random sample x, x, -— x of sizes п; from л; for 
i = 1,...,k. Employing our usual notations, the sample means, sample matrices, matri- 
ces of sample means and the sample sum of products matrices are as follows: 


1 р 
15 x (i) (i ) : . : 
xO = = x X = ix i Л = " | 
j=l a uu (i) 


xà — [x , ‚2... XUL so = [x _ xx _ х®ү, 
s® = (50), 50 — = Doh- xx — 2), $ 50 4 SO 4... gto 


(12.8.1) 
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Then, the unbiased estimators of the population parameters, denoted with a hat, are 


S 


A 2 X0, i=1,...,k, and Y= (12.8.2) 
пъп +: + пк = К 

The null hypothesis can be taken as X 15 UN xg and X originating from л; and 

x РУТ X coming from лу, j = 1,...,k, j # i, the alternative hypothesis be- 


ing: X and x eg x coming from лу for j = 1,...,k, j # i, and x еэ xe 
originating from л;. On proceeding as in Sect. 12.6, when the prior probabilities are equal 
and the cost functions are identical, the criterion for classification of the observed vector 
X to 7; for a specific i is 


n ( nj ) a - &6y(7—) a - 89) 


nj +1 
= ( с Ve = хоу (22) a -X0)20 (12.8.3) 
n; +1 Nk) i 


for j =1,...,k, j Z i, where the decision rule is A = (Aj,..., Ag), S = SOS 
and ny = ni Ьп +: пк — k. Note that (12.8.3) holds for each i, i = 1,...,k, and 
hence, A1, ..., Ак are available from (12.8.3). Thus, the vector X at hand is classified into 
Ai, that is, assigned to the population z;, if the inequalities in (12.8.3) are satisfied. This 
statement holds for each i, = 1,..., k. The exact distribution of the criterion in (12.8.3) 
is difficult to establish but the probabilities of classification can be computed from the 
asymptotic theory discussed in Sect. 12.7 by observing the following: 


When n; > oo, Х > ue, i = l,...,k, and when ny > oo,...,ny > 
оо, >D. Thus, asymptotically, when n; — oo, i = 1,..., k, the criterion specified 
in (12.8.3) reduces to the criterion (12.7.3) of Sect. 12.7. Accordingly, when n; — oo or 
for very large n;’s,i = 1, ..., k, one may utilize (12.7.3) for computing the probabilities 
of classification, which was illustrated in Examples 12.7.1 and 12.7.2. 


12.9. Maximum Likelihood Method and Unequal Covariance Matrices 


The likelihood procedure can also provide a classification rule when the normal pop- 
ulation covariance matrices are different. For example, let л : Р(Х) № РОК, У), 
У > О, and m : Р(Х) = Ny(uO, X5), X» > О, where и) # wp and X1 # x5. 
Let a simple random sample Х D seg XG {Ы of size n; from л and a simple random sample 
X Г, P "x of size пә from лә be available. Let ХЧ) and X® be the sample averages 
and Sı and Sz be the sample sum of products matrices, respectively. In classification prob- 
lems, there is an additional vector X which comes from z under the null hypothesis and 
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from 7r2 under the alternative. Then, the maximum likelihood estimators, denoted by a hat, 
will be the following: 
А x ^ $1 S : 
AY = = ХЧ), a = = XO, Уу = — and Š = 22 (i) 
П] nj 
respectively, when no additional vector is involved. However, these estimators will change 
in the presence of the additional vector X, where X is the vector at hand to be assigned 
to лу or лә. When X originates from л] огло, mo, and Thee are respectively estimated as 
follows: 


acy Xi X d „о) _ "2Xa- X i 
Ux RET C and р; EET. (ii) 
and when X comes from лі or лә, 2, and 2» are estimated by 
(1) (2) 
Xix = ———— and Xr — ——— iii 
1x ee 2x AE (iii) 
where 
1 " x nı 
s? = ac- AMA - AMY = (FLY (х- Яу -Ry 
2 ^ A А 
sP-a-agyx-agy-(—)e«-Xox-xy. _ 
referring to the derivations provided in Sect. 12.6 when discussing maximum likelihood 
procedures. Thus, the null hypothesis can be X and X (D — xj are from лі and 
X I es ,X o) are from лә, versus the alternative: X and X E ees, X T being from 7r? 
and X DU bd s from л. Let Lo and Lı denote the likelihood functions under the null 


and alternative hypotheses, respectively. Observe that under the null hypothesis, X; is es- 
timated by 27. of (iii) and 272 is estimated by 27 of (i), respectively, so that the likelihood 
ratio criterion A is given by 


^ ny 


| maxLo | рЫ Ue "pale 


o mali |ў "|$? 


(12.9.1) 


The determinants in (12.9.1) сап be represented as follows, referring to the simplifications 
discussed in Sect. 12.6: 


n] 


al 


П 1) oF 
sa 


. (12.9.2) 


EUST TTE m xs — Ху] 15417 


ES 
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The classification rule then consists of assigning the observed vector X to л] if A > 1 and, 


to zt? if A < 1. We could have expressed the criterion in terms of A; = d Іл] = m =n, 
which would have simplified the expressions appearing in (12.9.2). 
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Chapter 13 A) 


- Е Р m | 
Multivariate Analysis of Variation oe 


13.1. Introduction 


We will employ the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital letters 
X,Y,... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of letters 
such as x, y, X А Ү to denote variables in the complex domain. Constant matrices will for 
instance be denoted by A, B, C. A tilde will not be used on constant matrices unless the 
point is to be stressed that the matrix is in the complex domain. The determinant of a 
square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute 
value or modulus of the determinant of A will be denoted as |det(A)|. When matrices 
are square, their order will be taken as p x p, unless specified otherwise. When A is a 
full rank matrix in the complex domain, then AA* is Hermitian positive definite where 
an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will 
indicate the wedge product of all the distinct differentials of the elements of the matrix X. 
Thus, letting the p x q matrix X = (хуу) where the x;;’s are distinct real scalar variables, 
dX SAL, ^= dx;j. For the complex matrix X = Xi iX2, i = J/C-1), where Xi 
and X» are real, dX — dX, лах». 

In this chapter, we only consider analysis of variance (ANOVA) and multivariate anal- 
ysis of variance (MANOVA) problems involving real populations. Even though all the 
steps involved in the following discussion focusing on the real variable case can readily be 
extended to the complex domain, it does not appear that a parallel development of anal- 
ysis of variance methodologies in the complex domain has yet been considered. In order 
to elucidate the various steps in the procedures, we will first review the univariate case. 
For a detailed exposition of the analysis of variance technique in the scalar variable case, 
the reader may refer Mathai and Haubold (2017). We will consider the cases of one-way 
classification or completely randomized design as well as two-way classification with- 
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out and with interaction or randomized block design. With this groundwork in place, the 
derivations of the results in the multivariate setting ought to prove easier to follow. 


In the early nineteenth century, Gauss and Laplace utilized methodologies that may be 
regarded as forerunners to ANOVA in their analyses of astronomical data. However, this 
technique came to full fruition in Ronald Fisher’s classic book titled “Statistical Meth- 
ods for Research Workers”, which was initially published in 1925. The principle behind 
ANOVA consists of partitioning the total variation present in the data into variations at- 
tributable to different sources. It is actually the total variation that is split rather than the 
total variance, the latter being a fraction of the former. Accordingly, the procedure could be 
more appropriately referred to as “analysis of variation”. As has already been mentioned, 
we will initially consider the one-way classification model, which will then be extended to 
the multivariate situation. 


Let us first focus on an experimental design called a completely randomized experi- 
ment. In this setting, the subject matter was originally developed for agricultural experi- 
ments, which influenced its terminology. For example, the basic experimental unit is re- 
ferred to as a plot", which is a piece of land in an agricultural context. When an experi- 
ment is performed on human beings, a plot translates into an individual. If the experiment 
is carried out on some machinery, then a machine corresponds to a plot. In a completely 
randomized experiment, a set of nı + n» + --- + ny plots, which are homogeneous with 
respect to all factors of variation, are selected. Then, k treatments are applied at random to 
these plots, the first treatment to nı plots, the second treatment to n2 plots, up to the k-th 
treatment being applied to n; plots. For instance, if the effects of k different fertilizers on 
the yield of a certain crop are to be studied, then the treatments consist of these k fertilizers, 
the first treatment meaning one of the fertilizers, the second treatment, another one and so 
on, with the k-th treatment corresponding to the last fertilizer. If the experiment involves 
studying the yield of corn among k different varieties of corn, then a treatment coincides 
with a particular variety. If an experiment consists of comparing k teaching methods, then 
a treatment refers to a method of teaching and a plot corresponds to a student. When an 
experiment compares the effect of k different medications in curing a certain ailment, then 
a treatment is a medication, and so on. If the treatments are denoted by f1,..., fi, then 
treatment г; is applied at random to n; homogeneous plots or n; homogeneous plots are 
selected at random and treatment г; is applied to them, for j = 1,..., k. Random assign- 
ment is done to avoid possible biases or the influence of confounding factors, if any. Then, 
observations measuring the effect of these treatments on the experimental units are made. 
For example, in the case of various methods of teaching, the observation xj; could be the 
final grade obtained by the j-th student who was subjected to the i-th teaching method. In 
the case of comparing k different varieties of corn, the observation x;; could consist of the 
yield of corn observed at harvest time in the j-th plot which received the i-th variety of 
corn. Thus, in this instance, i stands for the treatment number and j represents the serial 
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number of the plot receiving the i-th treatment, x;; being the final observation. Then, the 
corresponding linear additive fixed effect model is the following: 


Xij = uc oj-éijj,j-l....ni t= 1,...,k, (13.1.1) 


where u is a general effect, о; is the deviation from the general effect due to treatment f; 
and ej; is the random component, which includes the sum total contributions originating 
from unknown or uncontrolled factors. When the experiment is designed, the plots are 
selected so that they be homogeneous with respect to all possible factors of variation. 
The general effect џи can be interpreted as the grand average or the expected value of 
xij When о; is not present or treatment t; is not applied or has no effect. The simplest 
assumption that we will make is that E[e;;] = О for all i and j and Var(e;;) = с? > 
0 for all i and j and for some positive quantity o?, where E[-] denotes the expected 
value of [-]. It is further assumed that u, ор, ..., о are all unknown constants. When 
01, ..., Œk are assumed to be random variables, model (13.1.1) is referred to as a“random 
effect model”. In the following discussion, we will solely consider fixed effect models. The 
first step consists of estimating the unknown quantities from the data. Since no distribution 
is assumed on the e;;'s, and thereby on the x;;'s, we will employ the method of least 
squares for estimating the parameters. In that case, one has to minimize the error sum of 


squares which is 
2 2 
› ej = › [xij — A — op. 
ij ij 


Applying calculus principles, we equate the partial derivatives of У”, j ер with respect to u 


to zero and then, equate the partial derivatives of 2 à e with respect to a, ..., Œg to zero 
and solve these equations. A convenient notation in this area is to represent a summation 
by a dot. As an example, if the subscript j is summed up, it is replaced by a dot, so that 
>”) xij = xi, ; similarly, У; xij = х... Thus, 


18] = (0 > -2 У [xij — u — aj] =0 3 (Yu - u- a1) —0 
ij ij i J 


that is, 
k 
2p — niu — niai] 0-»x,—n,u— У niai = 0, 
i i=l 
and since we have taken о; as a deviation from the general effect due to treatment f;, we 
can let У; то; = 0 without any loss of generality. Then, x, /n, is an estimate of u, and 
denoting estimates/estimators by a hat, we write A = х, /л.. Now, note that for example 
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о appears in the terms (x11 — ш — о)? +++ (Xin, — — g1)? = 25 (х1 = H — о)? but 
does not appear in the other terms in the error sum of squares. Accordingly, for a specific 
і, 


д " X 
BA =0 > У Dx; -u — о] = 0 > x; — nil — njà; = 0, 
ij J 


дот 


that is, à; = = — ji. Thus, 
А 1 А 1 " 
A = —x, and a; = —xi — Å. (13.1.2) 


The least squares minimum is obtained by substituting the least squares estimates of u and 
aj, i = l,...,k, inthe error sum of squares. Denoting the least squares minimum by 52, 


2 "HR x. х. X VT) 
j-Yw-à-ay-Yps-z-(E-2y 
= T | 


ij 
-X[s-2] 2X bs -2] -E[2-5]. (13.1.3) 
ij f 


ij : ij 


When the square is expanded, the middle term will become —2 У; e E х.)2, thus yield- 
ing the expression given in (13.1.3). As well, we have the following identity: 


k 
х. x2 д. xA 3X" д7 
ыер цы е 
—\n n, с тооп, топ, 
ij i=l i 


Now, let us consider the hypothesis Н, : a; = o» = --- = ax, which is equivalent to 
the hypothesis a; = 02 = --- = o = 0 since, by assumption, 5 '; nja; = 0. Proceeding 
as before, the least squares minimum, under the null hypothesis Ho, denoted by 52, is the 


following: 
2 
2 =) 
50 = Xij — 
Pa 
ij 

and hence the sum of squares due to the hypothesis or due to the presence of the @;’s, is 
given by s? — 52 = У; = ž-)?, Thus, the total variation is partitioned as follows: 


sê = [50 — 5°] + [5°] 
xA? х. x? Xi. A? 
MQu-z)-[DX€G-z)]|*[X Qs) ms. 
ij i ij 
Total variation (sQ) = variation due to the a;’s Gs — s”)+the residual variation (52), 
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which is the analysis of variation principle. If е;; N1(0, o?) for all i and j where 
o? > 015 a constant, it follows from the chisquaredness and independence of quadratic 


„2 
forms, as discussed in Chaps. 2 and 3, that A ~ eee a real chisquare variable having 


2. „2 
п, — 1 degrees of freedom, m I s X2] under the hypothesis H, and 2 ~ Xa 


where the sum of squares due to the о г’, namely sê — s?, and the residual sum of squares, 
namely s?, are independently distributed under the hypothesis. Usually, these findings 
are put into a tabular form known as the analysis of variation table or ANOVA table. The 
usual format is as follows: 


ANOVA Table for the One-Way Classification 


Variation due to df SS MS 

(1) (2) (3) (3/Q) 
treatments k—1 ЭС ža)? (sg — s°)/(k — 1) 
residuals n —k yp s?/(n, — k) 

total n —1 wij ОЧ) — х.)2 


where df denotes ће number of degrees of freedom, SS means sum of squares апа MS 
stands for mean squares or the average of the squares. There is usually a last column which 
contains the F-ratio, that is, the ratio of the treatments MS to the residuals MS, and enables 
one to determine whether to reject the null hypothesis, in which case the test statistic is 
said to be "significant", or not to reject the null hypothesis, when the test statistic is “not 
significant". Further details on the real scalar variable case are available from Mathai and 
Haubold (2017). 


In light of this brief review of the scalar variable case of one-way classification data 
or univariate data secured from a completely randomized design, the concepts will now be 
extended to the multivariate setting. 


13.2. Multivariate Case of One-Way Classification Data Analysis 


Extension of the results to the multivariate case is parallel to the scalar variable case. 
Consider a model of the type 
Xij = M + Ai + Eij, ]=1,...,т, її =1,...‚К, (13.2.1) 


with X;;, M, А; and Е;; all being p x 1 real vectors where X;; denotes the j-th observa- 
tion vector in the i-th group or the observed vectors obtained from the n; plots receiving 
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the i-th vector of treatments, М is a general effect vector, А; is a vector of deviations from 
M due to the i-th treatment vector so that 2 nj Aj = О since we are taking deviations 
from the general effect M, and Ej; is a vector of random components assumed to be nor- 


mally distributed as follows: Ej; e N,(O, X), X > O,forall i and j where X isa 
positive definite covariance matrix, that is, 


Cow(E;;) = ЕКЕ — O)(Eij — Oy] = E[E;j Ej] = E > О for all i and j, 


where Е[. ] denotes the expected value operator. This normality assumption will be needed 
for testing hypotheses and developing certain distributional aspects. However, the multi- 
variate analysis of variation can be set up without having to resort to any distributional 
assumption. In the real scalar variable case, we minimized the sum of the squares of the 
errors since the variations only involved single scalar variables. In the vector case, if we 
take the sum of squares of the elements in Е; ;, that is, E j E;; and its sum over all i and j, 
then we are only considering the variations in the individual elements of E;;'s; however, 
in the vector case, there is joint variation among the elements of the vector and that is also 
to be taken into account. Hence, we should be considering all squared terms and cross 
product terms or the whole matrix of squared and cross product terms. This is given by 
Ej E; j and so, we should consider this matrix and carry out some type of minimization. 
Consider 

У Bij Ej; = Уху - M — AillXij М — Aq. (13.2.2) 

ij ij 

For obtaining estimates of M and Aj, i = 1,...,k, we will minimize the trace of 
2:5 ЕЕ, as a criterion. There are terms of the type [X;; — M — Ail'[Xi; — M — Aj] in 
this trace. Thus, 


д д 
эм" 8080) = 2 эм!^” —M— AT[Xij M — А] = О 
^ 1 
= Xj; —n.M — A= М=—Х_, 

2 jn, >т = О = s 

ij i 
noting that we assumed that У, л; A; = O. Now, on differentiating the trace of E; j Е; j 
with respect to A; for a specific i, we have 


à КО. | 
sat E eu] = 34; 2o Uu - M - AVI - M - A] = 0 
1] 1] 
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Observe that there is only one critical vector for M and for Aj, i=1,...,k. Accordingly, 
the critical point will either correspond to a minimum or a maximum of the trace. But 
for arbitrary M and A;, the maximum occurs at plus infinity and hence, the critical point 
(M , А, i = 1,..., К) corresponds to a minimum. Once evaluated at these estimates, the 
sum of squares and cross products matrix, denoted by $, is the following: 


[x ж em [ x- | 
= › ij — ij — – У Ni = L= 
ij : n, ` ? n ` Р nj on, din — n ` 


(13.2.3) 


Note that as in the scalar case, the middle terms and the last term will combine into the 
second term above. Now, let us impose the hypothesis Н, : Ay = Аз = --- = Ax = 
О. Note that equality of the A;'s will automatically imply that each one is null because 
the weighted sum is null as per our initial assumption in the model (13.2.1). Under this 
hypothesis, the model will be X;; = М + Е;;, and then proceeding as in the univariate 
case, we end up with the following sum of squares and cross products matrix, denoted by 


So: 
So = >; [Xu = zx. ] [xs — zx]. (13.2.4) 
ij 
so that the sum of squares and cross products matrix due to the A;'s is the difference 
$-s-» [x = Za ae - x (13.2.5) 
ij 


Thus, the following partitioning of the total variation in the multivariate data: 
So = [So — S] + 5 
Total variation = [Variation due to the A;’s] + [Residual variation] 


1 1 1 1 1 1 / 
Nx - сх |х - zx] > ЕЕ я 


" i П. 
ij 
1 1 
+ у [xu Е тх |X Е х. | 


Under ће normality assumption for the random component Ejj E N,(O, X), X > 


O, we have the following properties, which follow from results derived in Chap. 5, the 
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notation И (у, X) standing for a Wishart distribution having v degrees of freedom and 
parameter matrix 2: 


Total variation = $ >» |x ly |[x ly | 
у = = ij —А.. ij— А. |, 
0 E J n. J n. 
So ~ Wp(n, — 1, X); 


E , 1 1 1 1 5 
Variation due to the A;’s = Sọ — S = p» |-x. — —x. |х, — —X.| ; 
— Ln; n. n. 
ij 


ni 
So — S ^ W,(k — 1, X) under the hypothesis A; = Аз = --- = Ак = O; 
Residual variation = 5 = 5 ^ [xij Lx; xs Ж] 
= = T 1] n; l. 1] п; I. | ? 
S ^ Wp(n, — К, X). 


We can summarize these findings in a tabular form known as the multivariate analysis of 
variation table or MANOVA table, where df means degrees of freedom in the correspond- 
ing Wishart distribution, and S S P represents the sum of squares and cross products matrix. 


Multivariate Analysis of Variation (MANOVA) Table 


Variation due to df SSP 

treatments k—1 Yl Xi, — 7X 15 Xi — x X.Y 
residuals n —k lX LXilXij = LXQ 
total n.—1 [Ху — 7X JX; = „Х| 


13.2.1. Some properties 


The sample values from the i-th sample or the i-th group or the plots receiving the i-th 
: і г Xy ; 
treatment are X;1, Xi2,..., Xin,. In this case, the average is NU E = ži, and the i-th 
sample sum of squares and products matrix is 


= Xi. Xi. A 
v Ee-30s- 5) 
j=l : 


As well, it follows from Chap. 5 that S; ~ №, (п; — 1, X) when Е;; xs №(0, X), X > 
O. Then, the residual sum of squares and products matrix can be written as follows, de- 
noting it by the matrix V: 
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Xi, Xi V “i Xi. Xi ү 
з у (а) = DLL о е 
1] ї=1 j=l 
k 
ES ees (13.2.6) 
i-l 


where S; ^ Wp(n; — 1, X), i = 1,...,k, and the S;'s are independently distributed 
since the sample values from the k groups are independently distributed among themselves 
(within the group) and between groups. Hence, 5 ~ №, (р, X), v = (ni = 1) + (12— 1) + 
e (пр — 1) =n. — k. Note that X; = D has Cov(X;) — iy, so that Jni(Xi — X) 
are iid N (O, 27) where Х=Х _/n,. Then, the sum of squares and products matrix due 
to the treatments or due to the A;'s is the following, denoting it by U: 


pu aec | | уй (esque) (13.2.7) 
xa cds a 


under the null hypothesis; when the hypothesis is violated, it is a noncentral Wishart dis- 
tribution. Further, the sum of squares and products matrix due to the treatments and the 
residual sum of squares and products matrix are independently distributed. Thus, by com- 
paring U and V, we should be able to reach a decision regarding the hypothesis. One 
procedure that is followed is to take the determinants of U and V and compare them. 
This does not have much of a basis and determinants should not be called “generalized 
variance" as previously explained since the basic condition of a norm is violated by the 
determinant. The basis for comparing determinants will become clear from the point of 
view of testing hypotheses by applying the likelihood ratio criterion, which is discussed 
next. 


13.3. TheLikelihood Ratio Criterion 


Let Ej; d N,(O, X), X > О, and suppose that we have simple random samples 
of sizes п, ..., ni from the k groups relating to the k treatments. Then, the likelihood 
function, denoted by L, is the following: 


Е Е П р 1 
ij (277) 2 | d’|2 
e 2 Xu (Xij-M—Aiy E" GGij- M- A) 


= cR. (13.3.1) 
(2л) 2 |X| 2 


e732 (Xij -M-A XT! (Xij-M—Aj) 
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The maximum likelihood estimators/estimates (MLE’s) of M is M= А. = Х and that 
ОЁ А; 15 А, == xi — М. With a view to obtaining the MLE of 27, we first note that the 
exponent is a real scalar quantity which is thus equal to its trace, so that we can express 


the exponent as follows, after substituting the MLE's of M and A;: 


-52. [Xy - M - AE [Xj - M – Aj] 
1 Xj. 4 -— |. 
u ac 2 sw. 
1] 
1 = Xi. XiT 
su [Xu - | xs - 75] ). 


Now, following through the estimation procedure of the MLE included in Chap. 3, we 
obtain the MLE of 27 as 
Xi 


= D (X - =) (xi; - 20). (13.3.2) 


1 


After substituting М, А; and X, the exponent in the likelihood ratio criterion А becomes 
—jn, (1) = — jn. p. Hence, the maximum value of the likelihood function L under the 
general model becomes 


e 20.4 E 


(20) |у, Xy — 1); — Xy 
Under the hypothesis Н, : A; = A2 = --- = Ap = О, the model is X;; = M + Е;; and 


the MLE of M under Н, is still Ix and $ under Ho is 1 У(Х, — Lx )(Xi — Lx y, 
so that max L under Н», denoted by max Lp, is | | | 


max L = (13.3.3) 


1 п.р 
e 2"Ру 2 


max Lo = a i 1 x (13.3.4) 
Om) YE Xy - 3X 206; — RX? 
Therefore, the A-criterion is the following: 
- тах Ly _ I2 rg QXij — AX; — хур 
maxL — |y; — Ay- 17 
VI? 
ld (13.3.5) 


"evi 
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where 


ij i ` ij 


and U ~ W,(k — 1, X) under Н, is the sum of squares and cross products matrix due 
to Ше A;'s and V ~ (и, — k, X) is the residual sum of squares and cross prod- 
ucts matrix. It has already been shown that U and V are independently distributed. Then 


= (О+У) 2 V(U + p with the determinant is a real matrix-variate type-1 


AVI. 
[ESL | | 
33 8 ED as defined in Chap. 5, oe W2 = V` 2UV 2 isa real 
matrix-variate type-2 beta with E e (E Е = 55). Moreover, Y = I — № = 
(U + V)-3U(U + у)—2 with 
ables with parameters (E, ? st 
gamma random variables, we have seen in Chap. 5 that W; and Y? = U + V аге indepen- 
dently distributed. Similarly, Ү = Z — W| and Ү» are independently distributed. Further, 


beta with parameters (4> 


| Ü P is a real О АИ type-1 beta random vari- 


1 ку " А . . 
W, | 2 y3U-1V? is a real matrix-variate type-2 beta random variable with the parame- 
7 k е EL). Observe that 


4 1 1 _ 
IW; = =— r = Wie Wa 
IUT-V|I yzzyyz +; [Wat 


A one-to-one function of А is 


2 IV] 
w = А" = = |||. (13.3.6) 
IU + V| 


13.3.1. Arbitrary moments of the likelihood ratio criterion 


For an arbitrary h, the h-th moment of w as well as that of à can be obtained from 
the normalizing constant of a real matrix-variate type-1 beta density with the parameters 
(= ОҢ = EL. That is, 


E[w^] = EUR а 
[w ] DES I, = +h) 


(п —k)-c-(k—1)—n.—1, 


_ I Pom) i DOE 
! EX ro B +h) 


а= j=l 
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As ЕГАР] = E[w?]^ = E[w ?^], the h-th moment of л is obtained by replacing л by 
(5 )h in (13.3.7). That is, 


р n,—k j-1 n. 
Di — ©— + (5)Л) 

һ 2 2 2 
E[A l= Срк I] Г п—1 j-1 пур 
j=1 (з — 3 t Gh) 


(13.3.8) 


where 
р res _ ii) 


Cp.k = — T 
Ц Dogg 


13.3.2. Structural representation of the likelihood ratio criterion 


It can readily be seen from (13.3.7) that the h-th moment of w is of the form of the h-th 
moment of a product of independently distributed real scalar type-1 beta random variables. 
That is, 


E[w^] = E[w1w2 --  wp]", ш = W|W2-::* Wp, (13.3.9) 
where w1, ..., шр are independently distributed and w is a real scalar type-1 beta random 
variable with the parameters (45% — E Ек), j=1,...,p,forn,—k > р— 1 and 


n, > k+ р — 1. Hence the exact density of w is available by constructing the density 
of a product of independently distributed real scalar type-1 beta random variables. For 
special values of p and k, one can obtain the exact densities in the forms of elementary 
functions. However, for the general case, the exact density corresponding to E [w^] as 
specified in (13.3.7) can be expressed in terms of a G-function and, in the case of E [A^] 
as given in (13.3.8), the exact density can be represented in terms of an H-function. These 
representations are as follows, denoting the densities of ш and A as fj, (w) and f; (A), 
respectively: 


mol jel 4. j=l yxxxy p 
fw) = Сд GP 7 ЕА ы " ,0«wczl, (13.3.10) 
A —.-1 у=; р 
m (ле у, ==; р 
AA) = Ск н» А "US , 0 < А < 1, (13.3.11) 
, (4°-4--5.5). Ј=1,...,р 


forn, > р+к 1, р > land fy(w) = 0, р (А) = О, elsewhere. The evaluation of 
G and H-functions can be carried out with the help of symbolic computing packages such 
as Mathematica and MAPLE. Theoretical considerations, applications and several special 
cases of the G and H-functions are, for instance, available from Mathai (1993) and Mathai, 
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Saxena and Haubold (2010). The special cases listed therein can also be utilized to work 
out the densities for particular cases of (13.3.10) and (13.3.11). Explicit structures of the 
densities for certain special cases are listed in the next section. 


13.3.3. Some special cases 


Several particular cases can be worked out by examining the moment expressions 


in (13.3.7) and (13.3.8). The h-th moment of the w = AR, where А is the likelihood 
ratio criterion, is available from (13.3.7) as 


restart — 1h). rC — 251 +h) 


E[w^] = Cy = | (0) 

Case (1): р = 1 
In this case, from (i), 
Г п—К +h 
E[w"] = бы wid 

which is the h-th moment of a real scalar type-1 beta random variable with the parameters 
(45 E. EL) and, in this case, w is simply a real scalar type-1 beta random variable with 
the parameters (254, E). We reject the null hypothesis Ho : A1 = Аз =--- = Ag = О 


for small values of the A-criterion and, accordingly, we reject H, for small values of w 
or the hypothesis is rejected when the observed value of w < wy where wg is such that 
i fw(w)dw = a for the preassigned size o of the critical region, f,,(w) denoting the 
density of w for p = 1, п. > k. 


Case (2): p = 2 
From (i), we have 


E[w^] = Cox — TE | 
rey ER) -3-h) 
and therefore Eo k l,h 
DC =s r(A 
Ew = Сы E D С 1m d) 


E m у Wigs. o. Bx 
peat ng Кы!) 
The gamma functions in (ii) can be combined by making use of a duplication formula for 

gamma functions, namely, 


POLE + 1/2) = 122! P (2z). (13.3.12) 
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Take z = noe = 1 + k and z = 4 — 1 + Д in the part containing h and in the constant 
part wherein = 0, and then apply formula (13.3.12) to obtain 


siii rn. —2) ra =k= 14h) 
Г  —К—1) Гәп —2+h) 


which is, for an arbitrary Л, the h-th moment of a real scalar type-1 beta random variable 


with parameters (n, — k —1,k — 1) for n, — k — 1 > 0, k > 1. Thus, у = ш? is a real 
scalar type-1 beta random variable with the parameters (n, — k — 1, k — 1). We would then 
reject Н, for small values of ш, that is, for small values of y or when the observed value 
of y < Ya with y, such that i fy(y)dy = a for a preassigned probability of type-I error 
which is the error of rejecting Н, when Н, is true, where f,(y) is the density of y for 
p = 2 whenever n, > К +1. 


Case(3:k 22, р> 1, п > р+1 


In this case, the h-th moment of ш as specified in (13.3.7) is the following: 


2 — n,— n — — 

Г(®;1 + Юг salsa). rE – PR) 
Г(® 1 —Р-һ) 

= Cp2 p 2 


E[w^] = С, 


since the numerator gamma functions, except the last one, cancel with the denominator 
gamma functions except the first one. This expression happens to be the h-th moment of 
a real scalar type-1 beta random variable with the parameters (21-Р ; P) and hence, for 
k —2,n,—1— p > Qand p > І, wis areal scalar type-1 beta random variable. Then, we 
reject the null hypothesis H, for small values of w or when the observed value of w < wg, 
with шо such that h” fw(w)dw = a for a preassigned significance level a, fy (w) being 
the density of ш for this case. We will use the same notation fj, (ш) for the density of w in 
all the special cases. 


Case (4):k 23, р> 1 


Proceeding as in Case (3), we see that all the gammas in the h-th moment of w cancel 
out except the last two in the numerator and the first two in the denominator. Thus, 


ГО +5 5+ ЮГО - 5р) 


E[w^] = Суз - —— 2 
POI ЮРОс =t 
T rg -P.Rrgeg-£-lym 
= ps3 п—1 s 


rE + ЮГО -$h) 
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After combining the gammas in y = w? with the help of the duplication formula (13.3.12), 
we have the following: 
L(n—2) Гт—р—2-+ЕҺ) 
Ely"]= : 
Г(п—2— р) Г(п—2-+Ь) 


Therefore, у = w? is a real scalar type-1 random variable with the parameters (n, — p — 
2, p). We reject the null hypothesis for small values of y or when the observed value of 
y < Yæ, With yg such that Ji fy(y)dy = a for a preassigned significance level о. We 
will use the same notation f, (у) for the density of y in all special cases. 


We can also obtain some special cases for t; = LI and h = Ex with y = yw. 


With this transformation, tų and f? will be available in terms of type-2 beta variables in 
the real scalar case, which conveniently enables us to relate this distribution to real scalar 
F random variables so that an F table can be used for testing the null hypothesis and 
reaching a decision. We have noted that 
w= yg Vyiv v) = 1 
IU + V| 
1 1 


иу др Wai 


where W; is а real matrix-variate type-1 beta random variable with the parameters 
(45 E EL) and W» is a real matrix-variate type-2 beta random variable with the parame- 


ters (ER, Y ky, Then, when p = 1, W; and № are real scalar variables, denoted by и 
and шә, respectively. Then for р = 1, we have one gamma ratio with Л in the general h-th 


moment (13.3.7) and then, 


1—w 1 
t = —— = ——-l=(u24+1)-l=u2 
w w 


where w» is a real scalar type-2 beta random variable with the parameters (2#, "> E 


As well, in general, for a real matrix-variate type-2 beta matrix W2 with the parameters 

7.3), we have We = Е, where Р, у, is a real matrix-variate F matrix random 
variable with degrees of freedom v, and v2. When р = 1 or in the real scalar case we = 
Ро where, in this case, F is a real scalar F random variable with v; and v2 degrees of 
freedom. We have used F for the scalar and matrix-variate case in order to avoid too many 
symbols. For p — 2, we combine the gamma functions in the numerator and denominator 
by applying the duplication formula for gamma functions (13.3.12); then, for t2 = 1— the 
situation turns out to be the same as in the case of tı, the only difference being that in the 
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real scalar type-2 beta w2, the parameters are (k — 1,n, — k — 1). Note that the original 
EL has become k — 1 and the original nok has become n. — k — 1. Thus, we can state the 
following two special cases. 


Case (5): р = 1, ti = = 
As was explained, t; is a real type-2 beta random variable with the parameters 


(ER, nok), so that 


k—1 
which is a real scalar F random variable with k — 1 and n, — k degrees of freedom. 
Accordingly, we reject Н, for small values of ш and y, which corresponds to large values 
of Е. Thus, we reject the null hypothesis Н, whenever the observed value of Fk—1,n—k > 
Еру п к,а Where Fk—1,n —k,q is the upper 100 œ% percentage point of the F distribution 
Or [> g(F)dF = о where а = Fy.1,4 —к and g(F) is the density of F in this case. 


Case (6): p = 2, t = сы у =\/ш 
As previously explained, тә is a real scalar type-2 beta random variable with the pa- 
rameters (k — 1, n, — k — 1) or 
n —k-1 
k—1 


ti X Fk-1,n -k> 


to X Fk-1),2(n.—k-1), 


which is a real scalar F random variable having 2(k — 1) and 2(n, — k — 1) degrees of 
freedom. We reject the null hypothesis for large values of t? or when the observed value of 
"Ib > b with b such that i g(F)dF = a, g(F) denoting in this case the density 
of a real scalar random variable F with degrees of freedoms 2(k — 1) and 2(n, — k — 1), 
and b = FoK~1),2¢ -k-1),o- 


XV i- 
Case (7): k 22, р> 1, t ==" 


For the case k = 2, we have already seen that the gamma functions with A in their 
arguments cancel out, leaving only one gamma in the numerator and one gamma in the 
denominator, so that w is distributed as 2 А real scalar type-1 beta random variable with the 
ee (= =i- P È 5). Thus, їр = ——— is a real scalar type-2 beta with the parameters 


Cs перы) ind 
n.—l—p 
жаг) Sil дын 
which is a real scalar F random variable having р and n, — 1 — p degrees of freedom. 
We reject H, for large values of t; or when the observed value of [= =- P|t > b where b 
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is such that f, "E g(F)dF = a with g(F) being the density of an F random variable with 
degrees of freedoms p and n, — 1 — p in this special case. 


Case (8): k = 3, p > 1, a= >, у= Jw 


On combining Cases (4) and (6), it is seen that f» is a real scalar type-2 beta random 
variable with the parameters (p,n, — p — 1), so that 


п—1—р 
————— hn X Нон оор ўз 


which is a real scalar F random variable with the degrees of freedoms (2p, 2(n, — p — 1)). 
Thus, we reject the hypothesis for large values of this F random variable. For a test at 
significance level o or with o as the size of its critical region, the hypothesis Н, : A1 = 
A2 =  — Ay = О is rejected when the observed value of this F > F2p,2(n —1—p),a 
where №} оп 1-р), 15 the upper 100 а% percentage point of the F distribution. 


Example 13.3.1. In a dieting experiment, three different diets D1, D» and D3 are tried 
for a period of one month. The variables monitored are weight in kilograms (kg), waist 
circumference in centimeters (cm) and right mid-thigh circumference in centimeters. The 
measurements are x, = final weight minus initial weight, x2 = final waist circumference 
minus initial waist reading and хз = final minus initial thigh circumference. Diet Dj is 
administered to a group of 5 randomly selected individuals (n; = 5), D», to 4 randomly 
selected persons (n? = 4), and 6 randomly selected individuals (из = 6) are subjected 
to D3. Since three variables are monitored, p = 3. As well, there are three treatments or 
three diets, so that k = 3. In our notation, 


xi Xlij Ху X12; X13] 
Х = |х |, Xj = |X |, Xy = | оу |, X2; = | x2; |, X37 = | X23; | 
X3 X3ij X31j xj X33j 


where i corresponds to the diet number and j stands for the sample serial number. For 
example, the observation vector on individual #3 within the group subjected to diet D» is 
denoted by X23. The following are the data on x1, x2, xa: 


Diet D; : X1;, pye1:2;3:4,:5: 
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Diet D2: Xoj, j = 1,2,3,4: 


1 3 —1 1 
Xa =]|2|, X2 =]|-1|, Хз=| 2|, Хщл=| 1 
2 —2 1 — 
Diet D3 : X3;, j =1,2,3,4,5,6: 
2 1 —1 
Хзу=| 2|,Хзә=|3|, X33=] 2], 
—1 1 2 
2 2 0 
X34 = |4 |, X35 = |0 |, Хз =] 1 
2 0 2 


(1): Perform an ANOVA test on the first component consisting of weight measurements; 
(2): Carry out a MANOVA test on the first two components, weight and waist measure- 
ments; (3): Do a MANOVA test on all the three variables, weight, waist and thigh mea- 
surements. 


Solution 13.3.1. We first compute the vectors X, , Xj, X2., Xo, Хз, Хз, Х and X: 


5 5 1 4 4 1 
ы. Wu cud ыс: Жу. “il 

х= |0. ==. 10| = 10|, = |4, G2 —— f |4|- |1), 
0 т Sjo 0 0 n 0 0 
6 1 15 15 1 

_ xX "E WE 

Xs ||" gs ч роо е6 = = = =| 16 | = | 16/15 

6 6 1 6 n, 6 6/15 


Problem (1): ANOVA on the first component xı. The first components of the observa- 
tions are x;;. The first components under diet D are 


[xiii х112, 113, X114; X115] = [2, 4, —1, —1, 1] with x11, = 5; 
the first components of observations under diet D» are 
[x121; X122; X123. X124] = [1, 3, —1, 1] with хр. = 4; 
and the first components under diet D3 are 


[х1зі, X132, X133. X134, X135. X136] = [2, 1, —1, 2, 2, 0] with x13, = 6. 
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Hence, the total on the first component x;, = 15, and x; = P = R = 1. The first 
component model is the following: | 


xij = M cT 0j t ец}, pec due abis 


Note again that estimators and estimates will be denoted by a hat. As previously men- 
tioned, the same symbols will be used for the variables and the observations on those 
variables in order to avoid using too many symbols; however, the notations will be clear 
from the context. If the discussion pertains to distributions, then variables are involved, 
and if we are referring to numbers, then we are dealing with observations. 


The least squares estimates are й = 2 = 1, @ = Sb = 2 = 1,4 = 2 = 1 = 1, 
ёз = 13. = 5 = |. The first component hypothesis is aj = o? = a3 = 0. The total sum 


of squares is 
2 
Dow- = У 
ij ij 
= (2—1)?°+(4—1)°+(—1—1)°++(—1—1)°+ (1—1)? 
+(1-1)7+6-1?4+(-1-1)7+(1-1)7 
+(@—1)°+(1—1)°++(—1—1)°++(2—1)°++(@—1)°-+„(0—1)? 
= 34. 


The sum of squares due to the a;’s is available from 
2 2 
(5 y Хи. _ 5. 
у nj —-—)= у — — — 
: П; п, — П; п, 
1 


-G- Re - E - =o 


Hence the following table: 


ANOVA Table 
Variation due to df SS MS F-ratio 
diets 2 0 0 0 
residuals 12 34 
total 14 34 


Since the sum of squares due to the o;'s is null, the hypothesis is not rejected at any level. 
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Problem (2): MANOVA on the first two components. We are still using the notation 
Xij for the two and three-component cases since our general notation does not depend 
on the number of components in the vector concerned; as well, we can make use of the 
computations pertaining to the first component in Problem (1). The relevant quantities 
computed from the data on the first two components are the following: 


. ‚|2 4 —1 —11 NX ESSEN {5 so JL [1]. 
pens [5 222 а ора " жее 
А Е —1 1 хо [4] s Ар [1]. 
Bee ч ре СЕ 

: 2х1 == 2 2-0 6 = 1 

Diet D3: [ 3 240 | Хз = Я X3 = B 


In this case, the grand total, denoted by Х , and the grand average, denoted by X, are the 


following: 
5] 5 1 fis І 
f= el mE [| = m 


Note that the total sum of squares and cross products matrix can be written as follows: 


5 4 
Уху -Xy - Xy = 3 00; - X)06; — XY + Y 06; – X)00; - Xy 


ij j=l j=l 
6 
+) (X3; - Х)(Хзу - Xy. 
j=l 
Then, 


5 
way "M 29/15 9 —138/15 
уз OPUS Ss so ie [Laas к | 


j=1 
M 9215 | [ 4 2/15] [0 0 
92/15 46?/15? 2/15 1/15? 0 162/152 
_ [18 —1 | 
~ |—1 5330/15? |? 


4 
| u 0 0 4 —62/15 
3:05; - 5605 - 3-15 uenis + eps. 3122| 


j=l 
gal) 74 -28/5] [0 0 | [s —6 р 
—28/15 142/152 0 1/157} |—6 1354/15?|' 
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у 14/15 4 28/15 
D ias we» |+ [o ani E iA | 
j= 
44/15 —16/15 
= iis 442/152| * ens 162/152 


1 1/15 | 
+|1/15 1/452| 7 | Do ; 


оруу py [18 -1 8 —6 8 1 
2 0 XXi; X = E] be Bawls oe 


1] 


34 —6 34 —6 
j E Gas) iid he ju zia 


Now, consider the residual sum of squares and cross products matrix: 
3 
2e Xi кы кезү S 2 AL 2 XLy 
2-05 — 39) = у =» (ау 
ij j 


2 6 


That is, 


5 
hie Г 9 -6] [4 4 
Уку - ioui =la se DS + 


4 

, [0 0 4 —4 4 —2 
2705; - yea - "= [n | +| E 1 
J= 
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: 10] fo о] [40 
X3. ee (XGA е. 
Las- 800; — 87| ој отоо 
J= 
"Im 1 21 |} 1/_]8 1 
2 4 —2 4 1 Xp 


inh osi ЭКЕ pg RE — ‚Жо. ре. Ж 
You - Eoo; ay = |19 „Ер «jn Й 
ij 


34 —6 34 —6 
= he | апа ke 2) = 1129 


Hence, 


Therefore, the observed w is given by 


1120 
w= 
1491.73 


= 0.7508, ~w = 0.8665. 


This is the case p = 2, k = 3, that is, our special Case (8). Then, the observed value of 


iyw _ 0.1335 
tb = = = 0.1540, 
27 Jw - 0.8665 
and 
n—l-p 15—1—2 
ty = = — (0.1540) = 0.9244. 
Р 


Our F-statistic is F2p,2(n.—-p-1) = Ё4,24. Let us test the hypothesis A; = A2 = Аз = О 
at the 5% significance level ог a = 0.05. Since the observed value 0.9244 < 5.77 = 
F4.24,0.05 which is available from F-tables, we do not reject the hypothesis. 


Verification of the calculations 


Denoting the total sum of squares and cross products matrix by S;, the residual sum of 
squares and cross products matrix by S, and the sum of squares and cross products matrix 
due to the hypothesis or due to the effects A;'s by Sp, we should have S, = S, + Sp where 


34 6 34 —6 
= E | and 5 = Е a] 
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as previously determined. Let us compute 


= Yu - I) - Ey 


nj 


For the first two components, we already have the following: 


Xp. J1 1 dos 0 _5 
ap Ge VO 46/15] [216/15 s 
X», X. |l 1 | f 0 ЕР 
реи |e 


Хз X., [1 1 [0 Lë 
mas ON 116/15] рае 


Hence, 


0 0 0 0 0 0 
2h P ens ttl uis] * [o ы 
0 0 
| ОТЕ 0 164/15|' 


As 34+ 19 = э, St = S, + Sh, that is, 


34 —6 _ |34 —6 4 0 0 
—6 674/15| |—6 34 О 164/15 |` 
Thus, the result is verified. 


Problem (3): Data on all the three variables. In this case, we have p = 3,k = 3. We 
will first use X1., X1, X2., X2, X3., Xa, X.. and X which have already been evaluated, to 
compute the residual sum of squares and cross product matrix. Since all the matrices are 
symmetric, for convenience, we will only display the diagonal elements and those above 
the diagonal. As in the case of two components, we compute the following, making use of 
the calculations already done for the 2-component case (the notations remaining the same 
since our general notation does not involve p): 
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5 131 9 —6 —3 Ay up o 
Yay- £506; – SY = 9 3] + 4 2|+| 4 -2 
js 1 1 1 


ч 
ipo 
=ч, 
>< 
[9%] 
чм. 
| 
ies 
wp» 
— 
H, 
>< 
U 
чм. 
| 
sits 
о |ы 
Б 
Il 
SSS 
— 
о о 
RON 
к NO e LW] 
+ 
T  —1 
о 
=. © 
QU Oo 
L cl 
+ 
гг 1 
A 
о о 
| 
— © м 
L | 


1-2 | 3. tod, 21 
ME TE а оер 1—1 
1 1 
8 1 -5 
= 10 3 
8 
Then, 
i8 Si = 8 —6 —6 8 1 —5 
X Xy- 44) (Xi – у = 18 2 |+ 6 7 |+ 10 3 
ij 4 10 8 
34 —6 —13 
= 34 12 
22 
whose determinant is equal to 
34 12 —6 12 —6 34 
=з] a +s a-e | 


= 15870. 
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The total sum of squares and cross products matrix is the following: 


5 4 
2 i — 0 – Ху = » (0; — 3000; - Ху + 9 (06; - X)06; - X 
ij j=l j=l 
6 - - 
+ 0G; — X); – Xy, 
j=l 


with 
5 1 29/15 9/15 
X Xy- ХХІ; — X = 292/152 (29 х 9)/152 
92/152 
9 —(3 х 46)/15 —(3 х 21)/15 4 92/15 —18/15 
+ + 


j=l 


462/152 (46 х 21)/152 462/152 —(46 х 9)/15? 


212/152 92/152 
4 2/15 42/15 0 0 0 
+ 1/15? 21/15? |+ 162/152 96/15? 
212/15? 36/15? 
18 -=l D 
= 5330/15? 930/15? |, 
1080/15? 


4 


0 0 0 
X (Xz - 3006; - Xy = 142/152 (14 x 24)/152 
24?/15? 


4 —62/15 —72/15 4 —28/15 —18/15 
+ 312/152 (31 x 369/15? | + 142/152 (14 x 9)/152 


j= 


362/152 92/15? 


0 0 0 8  —6 —6 
+ 1/15? 21/157 | = 1354/15? 1599/15? |, 
212/15? 2394/15? 
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6 : _ 1 14/15 —21/15 
065; — X)(X3j – Xy = 142/152 —(14 x 21)/152 
j=l 211/15? 
0 0 0 4 —28/15 —48/15 
+ 292/152 (29 х 9)/157 | + 142/152 (14 x 24/15? 
92/152 242/15? 
1 44/15 24/15 1 —16/15  —6/15 
+ 442/152 (44 x 24)/15? | + 162/15? 96/15? 
24? /15? 62/152 
1 1/15 24715 8 1 —5 
+ 1/153? —24/157 | = 3426/15? 1431/15? 
242/15? 2286/15? 


Hence the total sum of squares and cross products matrix is 


А | 18 —1 —2 8 —6 —6 
X (Xy - Х)(Ху – X} = 5330/15? 930/15? | + 1354/15? 1599/15? 
ij 1080/15? 2394/15? 
8 1 —5 34  —6 —13 
-+ 3426/15? 1431/15? | = | —6 674/15 264/15 
2286/15? —13 264/15 384/15 
: : 4, (674/15 264/15 —6 264/15 
and its determinant =“. ы | е, us 
—6 674/15| _ 
— 13 E el = 342126/15. 
Then, the observed value of 
15870 x 15 
OIE. 0.6958, /w = 0.83 


Since р = 3 and k = 3, an exact distribution is available from our special Case (8) for 


b= Ux and an observed value of t? — 0.1989. Then, 


n —l-p Жы зы шу F _F 
Р = 3 2= = 0 2p,2(n,.—1—p) = 6,22. 
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The critical value obtained from an F-table at the 5% significance level 15 F6,22,.05 ^: 3.85. 
Since the observed value of F6,22 is 2 (0.1989) = 0.7293 < 3.85, the hypothesis A1 = 
Аз = Аз = О is not rejected. It can also be verified that S, = S, + Shp. 


13.3.4. Asymptotic distribution of the A-criterion 


We can obtain an asymptotic real chisquare distribution for n, — oo. To this end, 
consider the general h-th moments of A or E [A^] from (13.3.8), that is, 


p . k 
n —k j-1 n, n—1 j-1 n, 
воч о E 45 Ea „аш 
А1 "Ti 5 2 12 2 2 2 


р . M 
-cu fira en 2 - iro en 5-2) 


Let us expand all the gamma functions in E[A"] by using the first term in the asymptotic 
expansion of a gamma function or by making use of Stirling's approximation formula, 
namely, 


Га + 8) m Or) 73e (13.3.13) 


for |z| — оо when ô is a bounded quantity. Taking 5 — оо in the constant part and 
5 (1 +h) — оо in the part containing h, we have 


r(Sa+h) jo! S /r(5a +h) jet X. 
2 2 2 2 2 2. 

л [VOTA + ayy zat Hi-$-ie-5 040 

/ V Ол)1% + hj] Z 0+0- 1-3 gem] 


= (PPa 70. 


k—l 
The factor (5) C2) is canceled from the expression coming from the constant part. Then, 
taking the product over j = 1,..., p, we have 


A > 1+һ) PM or у?” ОДРЕ for 1—2 > 0, 


which is the moment generating function (mgf) of a real scalar chisquare with p(k — 1) 
degrees of freedom. Hence, we have the following result: 
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Theorem 13.3.1. Letting X be the likelihood ratio criterion for testing the hypothesis 
H, : Ay = A2 = - -- = Ак = O, the asymptotic distribution of —2 ln А is a real chisquare 
random variable having p(k — 1) degrees of freedom as n, — oo, that is, 


— 2105 > X2q..4) as n, > oo. (13.3.14) 


Observe that we only require the sum of the sample sizes nı + ··· + ny = n, to go 
to infinity, and not that the individual n;'s be large. This chisquare approximation can 
be utilized for testing the hypothesis for large values of n,, and we then reject H, for 
small values of A, which means for large values of —2 In A or large values of Ж gas that 
is, when the observed —21nA > Ж ылы where emer denotes the upper 100 a% 
percentage point of the chisquare distribution. 


13.3.5. MANOVA and testing the equality of population mean values 


In a one-way classification model, we have the following for the p-variate case: 


Xij = М + А; + Eij or Xij = M; + Eij, with M; = М + А;, (13.3.15) 
for j= 1,...,n;, i = 1,..., k. When the error vector is assumed to have a null expected 
value, that is, E[E;;] = О, for alli and j, we have E[X;;] = М; for alli and j. Thus, this 
assumption, in conjunction with the hypothesis Ay = Az = --- = Ax = О, implies that 
Mı = М» = --- = Мі, that is, the hypothesis of equality of the population mean value 


vectors or the test is equivalent to testing the equality of population mean value vectors 
in К independent populations with common covariance matrix X > О. We have already 
tackled this problem in Chap. 6 under both assumptions that 27 is known and unknown, 
when the populations are Gaussian, that is, X;; ~ N5,(Mi, X), X > О. Thus, the hypoth- 
esis made in a one-way classification MANOVA setting and the hypothesis of testing the 
equality of mean value vectors in MANOVA are one and the same. In the scalar case too, 
the ANOVA in a one-way classification data coincides with testing the equality of popu- 
lation mean values in k independent univariate populations. In the ANOVA case, we are 
comparing the sum of squares attributable to the hypothesis to the residual sum of squares. 
If the hypothesis really holds true, then the sum of squares due to the hypothesis or to the 
a ;’s (deviations from the general effect due to the j-th treatment) must be zero and hence 
for large values of the sum of squares due to the presence of the a;’s, as compared to the 
residual sum of squares, we reject the hypothesis. In MANOVA, we are comparing two 
sums of squares and cross product matrices, namely, 


Xi X Xi Х ү Xx Xj 7 
2 n nm nd” 2 [Xu ni ^ ni 
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We have the following distributional properties: 


1 1 
Ti — (U +V) 2U(U +V) 2 ~ real matrix-variate type-1 beta with parameters 


k-1 —k4. 
Crs 5 ); 


1 1 
Т» = (U +V) 2V(U +V) 2 ~ real matrix-variate type-1 beta with parameters 


Ca) 
= е . | k-1 n-k 

Т = V 2UV 2 ~ real matrix-variate type-2 beta with parameters (55^, 5); 

T4 = U -3 VUT? ~ real matrix-variate type-2 beta with parameters (7 E EL. 
(13.3.16) 

The likelihood ratio criterion is 
V 1 1 
эн. ки ы тЫ ДЕ (13.3.17) 
IU + V| 3+1 [pO-c-n) 


where the 7;'s are the eigenvalues of T3. We reject Ho for small values of А which means 
for large values of Mi- [1 + n;]. The basic objective in MANOVA consists of comparing 
U and V, the matrices due to the presence of treatment effects and due to the residuals, 
respectively. We can carry out this comparison by using the type-1 beta matrices Тү and 
T» or the type-2 beta matrices 73 and 74 or by making use of the eigenvalues of these 
matrices. In the type-1 beta case, the eigenvalues will be between 0 and 1, whereas in 
the type-2 beta case, the eigenvalues will be real positive or simply positive. We may 
also note that the eigenvalues of Tj and its nonsymmetric forms U(U + У)! or (U + 
V)-!U are identical. Similarly, the eigenvalues of the symmetric form T» and V (U +V)! 
or (U + V)-!V are one and the same. As well, the eigenvalues of the symmetric form 
Т» and the nonsymmetric forms UV~! or V-!U are the same. Again, the eigenvalues 
of the symmetric form T4 and its nonsymmetric forms U~!V or VUT! are the same. 
Several researchers have constructed tests based on the matrices 71, 75, T3, T4 or their 
nonsymmetric forms or their eigenvalues. Some of the well-known test statistics are the 
following: 


Lawley-Hotelling trace = tr(73) 
Roy’s largest root = the largest eigenvalue of 75 
Pillai’s trace = (Т) 
Wilks’ lambda = |75| = the likelihood ratio statistic. 
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For example, when the hypothesis is true, we expect the eigenvalues of 73 to be small and 
hence we may reject the hypothesis when its smallest eigenvalue is large or the trace of 
T; is large. If we are using 74, then when the hypothesis is true, we expect 74 to be large 
in the sense that the eigenvalues will be large, and therefore we may reject the hypothesis 
for small values of its largest eigenvalue or its trace. If we are utilizing Tj, we are actually 
comparing the contribution attributable to the treatments to the total variation. We expect 
this to be small under the hypothesis and hence, we may reject the hypothesis for large 
values of its smallest eigenvalue or its trace. If we are using 75, we are comparing the 
residual part to the total variation. If the hypothesis is true, then we can expect a substantial 
contribution from the residual part so that we may reject the hypothesis for small values 
of the largest eigenvalue or the trace in this case. These are the main ideas in connection 
with constructing statistics for testing the hypothesis on the basis of the eigenvalues of the 
matrices Тү, 75, Тз and T4. 


13.3.6. When H, is rejected 


When H, : Aj =--- = Ax = О is rejected, it is plausible that some of the differences 
may be non-null, that is, A; — А; + О for some i and j,i з j. We may then test individual 
hypotheses of the type Ну : А; = Aj fori # j. There are k(k — 1)/2 such differences. 
This type of test is equivalent to testing the equality of the mean value vectors in two 
independent p-variate Gaussian populations with the same covariance matrix X > О. 
This has already been discussed in Chap. 6 for the cases 27 known and 27 unknown. In this 
instance, we can use the special Case (7) where for k = 2, and the statistic fj is real scalar 


type-2 beta distributed with the parameters (4, LIP), so that 
п—1—р 
——— A ~ Fpn-1-p (13.3.18) 
p 


where n, = nj + nj for some specific i and j. We сап make use of (13.3.18) for testing 
individual hypotheses. By utilizing Special Case (8) for k — 3, we can also test a hypoth- 
esis of the type A; = А; = A» for different i, j, m. Instead of comparing the results of 
all the k(k — 1)/2 individual hypotheses, we may examine the estimates of A;, namely, 
A; = Ds — x... i = 1,...,k. Consider the norms 12 — Ai, i = j (the Euclidean 
norm may be taken for convenience). Start with the individual test corresponding to the 
maximum value of these norms. If this test is not rejected, it is likely that tests on all 
other differences will not be rejected either. If it is rejected, we then take the next largest 
difference and continue testing. 
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Note 13.3.1. Usually, before initiating a MANOVA, the assumption that the covariance 
matrices associated with the k populations or treatments are equal is tested. It may happen 
that the error variable E1;, j = 1,..., пі, may have the common covariance matrix X, 
Езу, j = l,...,n2, may have the common covariance matrix 272, and so on, where not 
all the 27 ;°ѕ equal. In this instance, we may first test the hypothesis Н, : X; = 25» = 
eo = Ур. This test is already described in Chap. 6. If this hypothesis is not rejected, 
we may carry out the MANOVA analysis of the data. If this hypothesis is rejected, then 
some of the 2;'s may not be equal. In this case, we test individual hypotheses of the type 
Xj = Xj for some specific i and j , i # j. Include all treatments for which the individual 
hypotheses are not rejected by the tests and exclude the data on the treatments whose X ;°5 
may be different, but distinct from those already selected. Continue with the MANOVA 
analysis of the data on the treatments which are retained, that is, those for which the 2’;’s 
are equal in the sense that the corresponding tests of equality of covariance matrices did 
not reject the hypotheses. 


Example 13.3.2. For the sake of illustration, test the hypothesis Н, : A; = A» with the 
data provided in Example 13.3.1. 


Solution 13.3.2. We can utilize some of the computations done in the solution to Exam- 
ple 13.3.1. Here, n, = 5, по = 4 and n, = n, + n2 = 9. We disregard the third sample. 
The residual sum of squares and cross products matrix in the present case is available from 
the Solution 13.3.1 by omitting the matrix corresponding to the third sample. Then, 


2 п 18 —1 -2 8 —6 —6 
Xi Xi A! 
E(x- (xy - =) = ig: X |8 6 7 
і=1 j=l n ni 4 10 
26 —7 —8 
= | -7 24 9 
—8 9 14 
whose determinant is 
24 9 9 7 24 
26 | tal ШЕ " ] 8 z | = 5416 
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Let us compute Y? У (Xi — LXi ге Zy: 
1 23/9 5/9 9 —66/9 —39/9 
Х Х 
Y(xu- —)(xu- =) = 232/92 115/97 | + 222/92 286/92 
jc n. n. 52/92 132/92 
4 22/9  —10/9 4 -10/9 26/9 
222/92 —110/97 | + 52/92 —65/92 
oe 132/92 
18 —31/9  —18/9 
" ge ies = 1538/9? 242/92 |; 
42 192 404/9? 
4 0 0 0 4 —26/9 —44/9 
X X 
У (X25 - —)- =) = 142/92 14292 |+| 132/92 (13 x 22)/92 
j=l © 142/92 222/92 
4 —28/9  —10/9 o 0 0 
+ 142/92 70/92 |+ 52/92 65/92 
52/92 132/92 
8 —6 —6 
= 586/92 487/97 
874/92 
Hence the sum 
2 n 18 —31/9  —18/9 8 —6 —6 
X X 
УУ (ху-—=)(ху- = 1538/9? 242/92 |+| 586/92 487/92 
i=l j=l < " 404/9? 874/92 
26  —85/9 -8 
= | —85/9 236/9 9 |=U4+V 
—8 9 142/9 
whose determinant is 
236/9 9 85 |—85/9 9 —85/9 236/9| _ 
26| toa $ E. 142/978 2 о | = 8380.0549. 
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So, the observed values are as follows: 


11 5416 
— _ = 0.6463 
W =U +V| 8380.055 
1—ш 0.3537 
E _ — 0.5413 
l w 0.6463 
п—1—р 


ti = 2n = > (0.5413) = 0.9022, 
р 3 3 
and Ёп —1—p = F3,5. Let us test the hypothesis at the 5% significance level. The criti- 
cal value obtained from F-tables is F3,5,0.05 = 9.01. But since the observed value of F is 
0.9022 < 9.01, the hypothesis is not rejected. We expected this result because the hypoth- 
esis Ay = A» = Аз was not rejected. This example was mainly presented to illustrate the 
steps. 


13.4. MANOVA for Two-Way Classification Data 


As was done previously for the one-way classification, we will revisit the real scalar 
variable case first. Thus, we consider the case of two sets of treatments, instead of the sin- 
gle set analyzed in Sect. 13.3. In an agricultural experiment, suppose that we are consider- 
ing r fertilizers as the first set of treatments, say F1, ..., F,, along with a set of s different 
varieties of corn, Vi, ..., Vs, as the second set of treatments. A randomized block experi- 
ment belongs to this category. In this case, r blocks of land, which are homogeneous with 
respect to all factors that may affect the yield of corn, such as precipitation, fertility of 
the soil, exposure to sunlight, drainage, and so on, are selected. Fertilizers F,,..., F, are 
applied to these r blocks at random, the first block receiving any one of Fj,..., F, and 
so on. Each block is divided into s equivalent plots, all the plots being of the same size, 
shape, and so on. Then, the s varieties of corn are applied to each block at random, with 
one variety to each plot. Such an experiment is called a randomized block experiment. 
This experiment is then replicated т times. This replication is done so that possible inter- 
action between fertilizers and varieties of corn could be tested. If the randomized block 
experiment is carried out only once, no interaction can be tested from such data because 
each plot will have only one observation. Interaction between the i-th fertilizer and j-th 
variety is a joint effect for the (F;, V;) combination, that is, the effect of F; on the yield 
varies with the variety of corn. For instance, an interaction will be present if the effect of 
Е is different when combined with V; or V2. In other words, there are individual effects 
and joint effects, a joint effect being referred to as an interaction between the two sets of 
treatments. As an example, consider one set of treatments consisting of r different meth- 
ods of teaching and a second set of treatments that could be s levels of previous exposure 
of the students to the subject matter. 
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13.4.1. The model in a two-way classification 


The additive, fixed effect, two-way classification or two-way layout model with inter- 
action is the following: 


Xijk = U + etj + Bj + Vij + ijk, i= fethi Pea cA eS Beets. Ase) 


where u is a general effect, o; is the deviation from the general effect due to the i-th treat- 
ment of the first set, B; is the deviation from the general effect due to the j-th treatment 
of the second set, and у; ; is the effect due to interaction term or the joint effect of first and 
second sets of treatments. In a randomized block experiment, the treatments belonging to 
the first set are called “blocks” or “rows” and the treatments belonging to the second set are 
called “treatments” or “columns”; thus, the two sets correspond to rows, say R1, ..., Ry, 
and columns, say C1, ..., Cs. Then, yj; is the deviation from the general effect due to the 
combination (А;, Cj). The random component е; is the sum total contributions coming 
from all unknown factors and х; зк is the observation resulting from the effect of the com- 
bination of treatments (А;, Cj) at the k-th replication or k-th identical repetition of the 
experiment. In an agricultural setting, the observation may be the yield of corn whereas, 
in a teaching experiment, the observation may be the grade obtained by the “(i, j, k)"-th 
student. In a fixed effect model, all parameters 44, ор, ..., oy, Ві, ..., В; are assumed to 
be unknown constants. In a random effect model o, ..., o, or B1, ..., В; or both sets are 
assumed to be random variables. We assume that Е[е;;к] = 0 and Var(eijk) = o? > 0 
for all i, j, К, where E(-) denotes the expected value of (-). In the present discussion, we 
will only consider the fixed effect model. Under this model, the data are called two-way 
classification data or two-way layout data because they can be classified according to the 
two sets of treatments, “rows” and “columns”. Since we are not making any assumption 
about the distribution of e;;,, and thereby that of x;;;, we will apply the method of least 
squares to estimate the parameters. 


13.4.2. Estimation of parameters in a two-way classification 


The error sum of squares is 


2 2 
ey = У ijk — —0— B; - Ny)”. 
ijk 


Our first objective consists of isolating the sum of squares due to interaction and test the 
hypothesis of no interaction, that is, Н, : y;; = О for alli, j and К. If y;; 4 0, part of the 
effect of the i-th row R; is mixed up with the interaction and similarly, part of the effect 
of the j-th column, Су, is intermingled with y;;, so that no hypothesis can be tested on 
the a;’s and В; 'ѕ unless y;; is zero or negligibly small or the hypothesis y;; = 0 is not 
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rejected. As well, on noting that in [и + a; + Bj + уу], the subscripts either appear none 
at a time, one at a time and both at a time, we may write и + aj + Bj + уу = mij. Thus, 


д 
; 2 — M Ж M2 Эмн 
Ек = (Xijk — mij) — am, Eid =0 
ijk ijk H 
Г E Xij. 
— у (xijg — mij) 2 0 > xij — t Mij = О or mij = — 
k 


We employ the standard notation in this area, namely that a summation over a subscript is 
denoted by a dot. Then, the least squares minimum under the general model or the residual 
sum of squares, denoted by s?, is given by 


"= У (us- uy. (13.42) 


Now, consider the hypothesis Н, : yi; = О for all i and j. Under this Ho, the model 
becomes 


Xijk = ш +a; + Bj eia or У е = У а-и ai — Bj)’. 
j 
ijk ijk 


We differentiate this partially with respect to u and о; for a specific i, and to В; for a 
specific j, and then equate the results to zero and solve to obtain estimates for jz, о; and 
Dj. Since we have taken o, В; and yj; as deviations from the general effect u, we may let 
a — 0 +: +a = 0, B, = В, +---+ B; = 0 апа у; = 0, for each i and у; = 0 for 
each j, without any loss of generality. Then, 


ð 2 ^ x 
a eid =0> (Уха) —rstu — stæ, — rtp. = 0 > ù = те 


ijk 
д 
Зе leit] rp У bk — p —ai — Bj] = 0 
1 jk 
^ Xi, ОХ 
d a a Seal N 
5 


д 
gg =0> » -nu-a;—Bj]-0 


^ Xj 
— xj rtu — ta —rtfj =0 = В = == – й 
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Hence, the least squares minimum under the hypothesis H,, denoted by 55 is 
Ж d ааа сасе 
i 2 [ rst St rst rt rst 
ijk 
dV Xi.. is 2 
= ik ——~) —st ———) р ————), 
2. (s a) i 2 ( St 2J! ý 2 Cz z] 


1 


the simplifications resulting from properties of summations with respect to subscripts. 
Thus, the sum of squares due to the hypothesis Н, : yi; = 0 for alli and j or the interaction 
sum of squares, denoted by 8 is the following: 


2 2 
TANI ГА ( = (= =: 
55 = 5 – 25 Xjjk — — – з У Е 
А P J rst ~\st rst 


and since 
Xij. ) ( x... ) (5s X... ) 
Жу) = Xik — —] —t addas E 
2 e t 59 VI rst 2 t rst 
ijk ijk ij 
the sum of squares due to the hypothesis or attributable to the y;;’s, that is, due to interac- 
tion is 
аз a ae) а> са ү. азаз) 
rst -— sto rst rst) ` dui 


If the hypothesis y;; = 0 is not rejected, the effects of the y;;'s are deemed insignificant 
and then, setting the hypothesis y;; = 0, o; = 0, i = 1,...,r, we obtain the sum of 
squares due to the œ;’s or sum of squares due to the rows denoted as 32. is 


E ОХ. y : (® d. y 
= i ыан А ы ыу кешуге e x 13.4.4 
Sr » ( st rst s » st rst ( ) 


p 


Similarly, the sum of squares attributable to the 8;'s or due to the columns, denoted as 52, 


is 
party. (= = =e (13.4.5) 
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Observe that the sum of squares due to rows plus the sum of squares due to columns, 
once added to the interaction sum of squares, is the subtotal sum of squares, denoted by 


у= t); 7 (22 — х.) or this subtotal sum of squares is partitioned into ће sum of 
squares due to the rows, due to the columns and due to interaction. This is equivalent 
to an ANOVA on the subtotals У”, x; jk or an ANOVA on a two-way classification with 
a single observation per cell. As has been pointed out, in that case, we cannot test for 
interaction, and moreover, this subtotal sum of squares plus the residual sum of squares is 
the grand total sum of squares. If we assume a normal distribution for the error terms, that 
is, ĉijk де N\(0, 07), o? > 0, for all i, J, k, then under the hypothesis Н, : y;; = 0, it can 
be shown that 
s% 2 
Ху, v=(rs—1l)—(r—1)—(s—1)=(r—1)(s— 1), (13.4.6) 


o? 


and the residual variation s? has the following distribution whether H, holds or not: 


2 
Xs vı — rst — 1— (rs — 1) = rs(t — 1), (13.4.7) 
c 
where 52 and s? are independently distributed. Then, under the hypothesis y; у = О for all 


i and j or when this hypothesis is not rejected, it can be established that 
r 2 Sc 2 
gi ~ X*-v „2 ~ Xd (13.4.8) 


and 32 and s? as well as 52 and s? are independently distributed whenever Н, : у = 0 is 
not rejected. Hence, under the hypothesis, 


s2/(r — 1)(5 — 1) 
Y "» AN u E с t. 
sS2/rs(t — 1) Fy y, у= (л 1)(у — 1), vı =rs(t — 1). (13.4.9) 


А 2 - 
The total sum of squares is » jk (xi jk — х.) . Thus, the first decomposition and the first 
part of ANOVA in this two-way classification scheme is the following: 
Total variation = Variation due to the subtotals + Residual variation, 
the second stage being 
Variation due to the subtotals = Variation due to the rows 


+ Variation due to the columns + Variation due to interaction, 


and the resulting ANOVA table is the following: 
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ANOVA Table for the Two-Way Classification 


df SS MS 
Variation due to | (1) (2) (3)=(2)/(1) 
rows r—1 MY a QR — E 52147 — 1) = Dı 
columns s—1 WM. (GE — 25)? | 52/(9 — 1) = D 
interaction (r D = 1) |s 59/07 = 10)(8 = 1) = Рз 
subtotal rs—1 УСЁ — E 
residuals rs(t — 1) 52 s*/[rs(t — 1)] = р 
total rst — 1 ijk ijk — za)? 


where df designates the number of degrees of freedom, SS means sum of squares, 
MS stands for mean squares, the expressions for the residual sum of squares is given 
in (13.4.2), that for the interaction in (13.4.3), that for the rows in (13.4.4) and that for 
columns in (13.4.5), respectively. Note that we test the hypothesis on the o;'s and B's or 
row effects and column effects, only if the hypothesis у;; = 0 is not rejected; otherwise 
there is no point in testing hypotheses on the o;'s and B;'s because they are confounded 
with the y;j's 


13.5. Multivariate Extension of the Two-Way Layout 


Instead of a single real scalar variable being studied, we consider a p x 1 vector of 
real scalar variables. The multivariate two-way classification, the fixed effect model is the 
following: 

Xii = M + Ait В; Tij + Eijk, (13.5.1) 


fori —1,...,r, j=1,...,5,k=1,...,t, where М, Ai, Bj, Tij and ЕЁ; are all p x 1 
vectors. In this case, M is a general effect, A; is the deviation from the general effect due 
to the i-th row, Bj is the deviation from the general effect due to the j-th column, Г; is 
the deviation from the general effect due to interaction between the rows and the columns 
and Е; у is the vector of the random or error component. For convenience, the two sets 
of treatments are referred to as rows and columns, the first set as rows and the second, as 
columns. In a two-way layout, two sets of treatments are tested. As in the scalar case of 
Sect. 13.4, we can assume, without any loss of glee that У, Aj = Ау +--- +A, = 
А =0,В=0О,У_|Г;=Гу= О and 5. Dnjedyeu. me PN the 
procedures are parallel to those developed in Sect. 13. 4 for the real scalar variable case. 
Instead of sums of squares, we now have sums of squares and cross products matrices. As 
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before, we may write М;; = M + A; + Bj + Dij. Then, the trace of the sum of squares and 
cross products error matrix Е; E; jk is minimized. Using the vector derivative operator, 
we have 


д 
atl у Ej Ej] = О у (Хк — Mij) = О 
J ijk k 
н 1 
=> Mij = тХу., 


so that the residual sum of squares and cross products matrix, denoted by Sres, is 


б (Xii = 2 (Xi = 2 (13.52) 
ijk 


t 


All other derivations are analogous to those provided in the real scalar case. The sum of 
squares and cross products matrix due to interaction, denoted by Sins is the following: 


Au. Ac XY 
Sesi ( ij. S TM S 


rst rst 
Xi. AS XX S ola! 
— t — == 
: zu ee a) 
X j X X j XN 
ж ie s бы gi. Sey, 13.53 
r X rd ee 2] ( ) 


The sum of squares and cross products matrices due to the rows and columns are 
respectively given by 


$ гу (= ү = (13.5.4) 
ro = 6 PI TT , ә. 
К CN SE туо rst 
LSYX C. X X; Xs 
MT (=+ = =) (24 2 =) | 13.5.5 
; i »» rt rst rt rst ( ) 


The sum of squares and cross products matrix for the subtotal is denoted by S55 = Srow + 
Scol + Sint. The total sum of squares and cross products matrix, denoted by Sjo;, is the 


following: 5 x 
Stot = У; (Xii = ==) (Xii = ==) . (13.5.6) 
ijk 
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We may now construct the MANOVA table. The following abbreviations are used: df 
stands for degrees of freedom of the corresponding Wishart matrix, SSP means the sum 
of squares and cross products matrix, MS stands for mean squares and is equal to SSP/df, 
and 5,0, Scot, Sres and So; are respectively specified in (13.5.4), (13.5.5), (13.5.2) 
and (13.5.6). 


MANOVA Table for a Two-Way Layout 


df SSP MS 
Variation due to (1) (2) (3)=(2)/(1) 
TOWS r—1 Srow Srow/(r = 1) 
columns s—1 Scol Scol / (s — 1) 
interaction (т — D -— 1) Sint Sin/[(r — 1)(s — 1)] 
subtotal rs—1 Ssub 
residuals rs(t —1) Sres Sres/[rs(t — 1)] 
total rst — 1 Stot 


13.5.1. Likelihood ratio test for multivariate two-way layout 


Under the assumption that the error or random components Е; зк a №(0, X), X > 
O for alli, j and k, the exponential part of the multivariate normal density excluding -j 
is obtained as follows: 


Е; Eijk = (Xijk M — A; — Bj Г) X (Xijk M — A; — В; — Tij) 
= ЩЕ (Xij М — Ai — Bj T;j)(Xijk М — Ai — Bj – Tj) > 


У El Eijk zez | (б M — A; В, l'ij)(Xijk M — Aj Bj ny ||. 
ijk ijk 


Thus, the joint density of all the X;;;'s, denoted by L, is 


1 


prst rst 


2 || 2 
=al Z7! Xj Oti МА Ву Dj) Gi M-Ai-Bj- Пу). 


The maximum likelihood estimates of M, A;, B; and Г;; are the same as the least squares 
estimates and hence, the maximum likelihood estimator (MLE) of X is the least squares 
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minimum which is the residual sum of squares and cross products matrix S ог Ses (in the 
present notation), where 


Sres = 9 (ii - M А — Bj — Fi) (Xi — M — Ai — Bj — Бу) 
ijk 
Xi; / 
= (x e ŽE) Xin - a: (13.5.7) 
ijk 


This is the sample sum of squares and the cross products matrix under the general model 
and its determinant raised to the power of A is the quantity appearing in the numerator 
of the likelihood ratio criterion A. Consider the hypothesis Н, : T;; = О for all i and j. 
Then, under this hypothesis, the estimator of 27 is So, where 


X Xi X X; X 
eem [ _ ==) _ ( i. 3 _ ( Je =)] 
0 2. jk rst St rst rt rst 

X.. Xi. X. Xj — X.NY 
| = ccu xcu 
rst St rst rt rst 


and |So| > is the quantity appearing in the denominator of A. However, So — Sres = Sint 
is the sum of squares and cross products matrix due to the interaction terms Г; ;'5 or to the 
hypothesis, so that So = Spes + Sint. Therefore, А is given by 


rst 


Sres 0. 
= T (13.5.8) 
[Sres + Sint| ? 
. 2 
Letting w = Ars, 
S, 
w= Sres (13.5.9) 
[Sres + Sint | 


It follows from results derived in Chap. 5 that Spes ~ Wp(rs(t — 1), X), Sint ~ Wp ((r — 
1)(s — 1), X) under the hypothesis and S,¢; and Sint are independently distributed and 
hence, under H,, 


W = (Sres + у (Sres + era ~ real p-variate type-1 beta random variable 


with the parameters CS, == As well, 


_1 =! 
Wi = 5; Sint Srez ~ real p-variate type-2 beta random variable 
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with the parameters (Ven, mea), Under Ho, the h-th arbitrary moments of ш and 


A, which are readily obtained from those of a real matrix-variate type-1 beta variable, are 


Р Ги + = 50) урур Toth- #2) 
E[w}" = ни (13.5.10) 
П ru — +) НЦ Fo кэз eh Hl 
Р Tmt- 50) рррр To th- 51) 
ЕД” = So ee 13.5.11 
[4] | ro = 41 HIT; D (vi + vo + Sth — gy ( ) 


j=l 


where v, = nen and v2 = ee) Note that we reject the null hypothesis Н, : Tij = 
О, = l,...,r, j = l,...,s, for small values of w and A. As explained in Sect. 13.3, 
the exact general density of w in (13.5.10) can be expressed in terms of a G-function and 
the exact general density of à in (13.5.11) can be written in terms of a H-function. For the 
theory and applications of the G-function and the H-function, the reader may respectively 
refer to Mathai (1993) and Mathai et al. (2010). 


13.5.2. Asymptotic distribution of X in the MANOVA two-way layout 


Consider the arbitrary h-th moment specified in (13.5.11). On expanding all the gamma 
functions for large values of rst in the constant part and for large values of rst(1 + A) in 
the functional part by applying Stirling's formula or using the first term in the asymp- 
totic expansion of a gamma function referring to (13.3.13), it can be verified that the A-th 
moment of A behaves asymptotically as follows: 


р(7—1)(58— 1) 


^— (Rh) 2 > 2А > А рур 88 TSt — Оо. (13.5.12) 
Thus, for large values of rst, one can utilize this real scalar chisquare approximation for 


testing the hypothesis Н, : Г; = О for alli and j. We can work out a large number of 
exact distributions of ш of (13.5.10) for special values of r, s, t, p. Observe that 


ek 


where С is the normalizing constant such that when Л = 0, E [w^] = 1. Thus, when 
(r — 1)(s — 1) is a positive integer or when r or s is odd, the gamma functions cancel 
out, leaving a number of factors in the denominator which can be written as a sum by 
applying the partial fractions technique. For small values of p, the exact density will then 
be expressible as a sum involving only a few terms. For larger values of p, there will be 
repeated factors in the denominator, which complicates matters. 
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13.5.3. Exact densities of ш in some special cases 


We will consider several special cases of the h-th moment of w as given in (13.5.13). 
Case (1): р = 1. In this case, h-th moment becomes 
(t—1) (7—1)(5—1) 
ree 5 + r = + h) 


where Сү is the associated normalizing constant. This is the h-th moment of a real scalar 
type-1 beta random variable with the parameters a SS Hence у = “isa 


E[w^] = C 


real scalar type-2 beta with parameters (£-Dg-D. e D, and 
rs(t — 1) 

(т — DG — 1) 
Accordingly, the test can be carried out by using this F-statistic. One would reject 
the null hypothesis Н, : Ij; = О if the observed F > Fo-1)(s-1),rst—1),x Where 
Fir—1)(s—1),rs(t—1),a 18 the upper 100 a@% percentile of this F-density. For example, for 
r = 2,5 = 3,t = 3 anda = 0.05, we have ЕЁ» 120.05 = 19.4 from F-tables so that Ho 
would be rejected if the observed value of F2 12 > 19.4 at the specified significance level. 


y Рн тубе урн): 


Case (2): р = 2. In this case, we have а ratio of two gamma functions differing by 1. 
Combining the gamma functions in the numerator and in ће denominator by using ће 


duplication formula and proceeding as in Sect. 13.3 for the one-way layout, the statistic 
— 1-0 
ti = Jü 


, and we have 
rs(t— 1) 
— — t 
(т — DG — 1) 
so that the decision can be made as in Case (1). 


ax Fo(r—1)(s—1),2(rs(t—1)— 1) 


Case (3): (r — 1)(s — 1) = 1 > r = 2,5 = 2. In this case, all the gamma functions 
in (13.3.13) cancel out except the last one in the numerator and the first one in the de- 
nominator. This gamma ratio is that of a real scalar type-1 beta random variable with the 
parameters [Eco 5), and hence у = L» is a real scalar type-2 beta so that 


rs(t —1)3-1—p 


р 
and decision can be made by making use of this F distribution as in Case (1). 


yo p.rs(t—1)+1-p> 
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Case(4): (r — 1)(s — 1) = 2. In this case, 


p 
1 
E[w^] = С! - 
П ree) ES i +h 


with the corresponding normalizing constant Су. This product of p factors can be ex- 
pressed as a sum by using partial fractions. That is, 


Еш] = С. У’ e (i) 


bj = lim [(a@t+h)@th—})---@th—-Sh@tn—-)..-@4+n-), 
a--h— 3; 
2 
(i) 
rs(t — 1) 
T 


Thus, the density of w, denoted by ХА, (ш), which is available from (i) and (ii), is the 
following: 


p-l 
fow) = Су у bjw, 0 w & 1, 
j-0 
and zero elsewhere. Some additional special cases could be worked out but the expres- 
sions would become complicated. For large values of rst, one can apply the asymptotic 
chisquare result given in (13.5.12) for testing the hypothesis Ho : Ij; = О. 


Example 13.5.1. An experiment is conducted among heart patients to stabilize their sys- 
tolic pressure, diastolic pressure and heart rate or pulse around the standard numbers which 
are 120, 80 and 60, respectively. A random sample of 24 patients who may be considered 
homogeneous with respect to all factors of variation, such as age, weight group, race, gen- 
der, dietary habits, and so on, are selected. These 24 individuals are randomly divided into 
two groups of equal size. One group of 12 subjects are given the medication combination 
Med-1 and the other 12 are administered the medication combination Med-2. Then, the 
Med-1 group is randomly divided into three subgroups of 4 subjects. These subgroups are 
assigned exercise routines Ex-1, Ex-2, Ex-3. Similarly, the Med-2 group is also divided at 
random into 3 subgroups of 4 individuals who are respectively subjected to exercise rou- 
tines Ex-1, Ex-2, Ex-3. After one week, the following observations are made x; = current 
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reading on systolic pressure minus 120, x2 = current reading on diastolic pressure minus 
80, x3 = current reading on heart rate minus 60. The structure of the two-way data layout 
is as follows: 
Ex-1 Ex-2 Ex-3 
Med-1  four3 x 1 vectors four3 x 1 vectors four3 x 1 vectors 
Med-2  four3 x 1 vectors four3 x 1 vectors four3 x 1 vectors 


Let Xj; be the k-th vector in the i-the row (i-th medication) and j-th column (j-th exercise 
routine). For convenience, the data are presented in matrix form: 


Aj = [Xii X112, X113, X114], Ai2 = [X121, X122, X123, X124], 
A13 = [X131, X132, X133, X134], A21 = [A211. А212, A213, A214]. 
A»? = [A221, A222, A223, A224], A23 = [X231, X232, X233, X234]; 


=2 32 УУ 1 4 —1 4 4 -3 3 4 

Ag(- wl cel Aye =2 =2 =3 3, Aye 2Ж=—3 2 3, 
2 —1 —1 0 3 —2: seb) Jim pd. 
20-2 0 3 —1 —1 3 —2 A. 0—1 

An= 14 1 2, Ayvy= 4 4 00, Аз= 1 4 0 3. 
2s pep s2 0 1-14 —2 0-2 0 


(1) Perform a two-way ANOVA on the first component, namely, х1, the current reading 
minus 120; (2) Carry out a MANOVA on the full data. 


Solution 13.5.1. We need the following quantities: 


8 8 8 24 
Xi, = |0 |, Хр = | 4 |, Xi, = |4| >X. =] 0 
0 0 0 0 
0 4 _4 0 
ХЮ. = |8 |, Хәр = |8 |, Хз. = | 8| >X = | 24 
0 4 _4 0 
8 12 4 24 
Ху= |8 |, Хо= | 4 |, X3 = 12| > X =} 24 
0 4 —4 0 


><! 
Il 
z |A 
ч 
[ 
N 
| 
N N 
CORK 
Lo 
| 
= = 
OW 
II 
t2 
[ 
| 
[99] 
~ 
II 
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By using the first elements in all these vectors, we will carry out a two-way ANOVA and 
answer the first question. Since these are all observations on real scalar variables, we will 
utilize lower-case letters to indicate scalar quantities. Thus, we have the following values: 


Stor = 2 uk — = (-2- + B- 1)? +--- C1- 1)? = 136, 
З { 
shy = 12> ( кз 1) = DIQ — 1)? + (0 — 102] = 24, 


y afoot G- YG- Y] 


Xij.\? 
ce тт >D (ss = 2) = (—2 — 2)? +--+ (—1 + D E 104, 


ijk 4 
2 
Sio - Mu 5) (b uD eae CT ed36. 
ijk 


All quantities have been calculated separately in order to verify the computations. We 
could have obtained the interaction sum of squares from the subtotal sum of squares minus 
the sum of squares due to rows and columns. Similarly, we could have obtained the residual 
sum of squares from the total sum of squares minus the subtotal sum of squares. We will 
set up the ANOVA table, where, as usual, df stands for degrees of freedom, 55 means 
sum of squares and M S denotes mean squares: 
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ANOVA Table for a Two-Way Layout with Interaction 


df. SS MS 

Variation due to (1) (2) (3)=(2)/(1) F-ratio 
rows 1 24 24 24/5.78 
columns 2 4 2 2/5.78 
interaction 2 4 2 2/5.78 
subtotal 5 32 

residuals 18 104 5.78 

total 23 136 


For testing the hypothesis of no interaction, the F-value at the 5% significance level 
is Ёз 13,005 ^ 19. The observed value of this F2,1g being 525 ~ 0.35 < 19, the 
hypothesis of no interaction is not rejected. Thus, we can test for the significance of 
the row and column effects. Consider the hypothesis ој = a2 = 0. Then under this 
hypothesis and no interaction hypothesis, the F-ratio for the row sum of squares is 
24/5.78 ~ 4.15 < 240 = F1.18,0.05; the tabulated value of Ё| 18 at a = 0.05. There- 
fore, this hypothesis is not rejected. Now, consider the hypothesis Ву = f» = fs; = 0. 
Since under this hypothesis and the hypothesis of no interaction, the F-ratio for the col- 
umn sum of squares is 525 = 0.35 < 19 = Р 18,0.05, it is not rejected either. Thus, the 
data show no significant interaction between exercise routine and medication, and no sig- 
nificant effect of the exercise routines or the two combinations of medications in bringing 
the systolic pressures closer to the standard value of 120. 


We now carry out the computations needed to perform a MANOVA on the full data. 
We employ our standard notation by denoting vectors and matrices by capital letters. The 
sum of squares and cross products matrices for the rows and columns are the following, 
respectively denoted by S;ow and Scol : 


2 
Xi X Xi. X.. : 
CNN ( а D 2 ) 
S » st rst^ N st rst 
1 —1 24 —24 0 


= 0 zx nemore [-1,1,0]] = 12 —4 240], 
0 0 0 00 
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j=l 
EE NE pi i 4-4 4 
=з{Оо+ | -1 1 -1 |+3| -1 1 -1 [|}=| -4 4 -4 |, 
ae ш KAN 37 Аад 


"E St rt rst 
pg ay ra! E 
=4ļo+z3|1 1 1]+5]1 1 1] +0 
үә, lene a a 
Fa eee |e ss i 444 
+;|111+2]111]}=]44 4], 
Sia ja LO 444 
Hie S. , 
sacr) 
p LG t 
1-10 4-2 2 32 —24 8 
0 00 a ies | 8 0 8 


We can verify the computations done so far as follows. The sum of squares and cross 
product matrices ought to be such that Sow + Scol + Sint = Ssup. These are 


24 —24 0 4-4 4 444 
Srow + Scot + Siu =| —24 24 0 |+| 4 4 4 ]4+]4 4 4 
0 00 4 -4 4 444 
32 —24 8 
= | -24 32 0 |= Sup. 
8 0 8 


Hence the result is verified. Now, the total and residual sums of squares and cross product 
matrices are 
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Stot = У(Х — Х)(Х к — Xy 


ijk 
9 0 —6 4 —4 0 136 11 16 
= 00 Of+---+] -4 4 0]=] 11 112 6 
—6 0 4 0 00 l6 6 60 
and 
Xij. Xij 
Sres = 2 (Xi = t (х ЕЕ t ) 
16 —14 —8 2: 33-36 104 35 8 
zero ode el eee St 49:23 | 85- 80706 
—8 2 4 0 2 4 8 6 32 
Then, 
104 35 8 44 4 108 39 12 
Sres + Sint = | 35 80 6|+|4 4 4) =] 39 84 10 
8 6 52 444 12 10 56 


The above results are included in the following MANOVA table where df means degrees 
of freedom, 55 P denotes a sum of squares and cross products matrix and M 5 is equal to 
SSP divided by the corresponding degrees of freedom: 


MANOVA Table for a Two-Way Layout with Interaction 


df. SSP MS 
Variation due to (1) (2) (3)=(2)/(1) 
rows 1 бу бй Siow 
columns 2 Scol 1 Scol 
interaction 2 Sint 1 Sint 
subtotal 5 Ssub 
residuals 18 Sree k Sres 


total 23 S, tot 
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Then, the A-criterion is 


rst 


|Sres| 7 [Sres] 


= SUE Ш = ———. 
| Sres + Sint| 2 [Sres + Sintl 
The determinants are as follows: 
80 6 


35 6 35 80 


[Sres] = 104% 2-5 [s в | = 363436 
84 10 39 10 39 84 
[Sres + Sintl = 108 io s6 -3 i2 зе) +1235 P — 409320. 
Therefore, 
= 209490 = 0.888 => шш = —0.118783 = -—21n A = 24(0.118783) = 2.8508 
w= X955 F Ш = A = . — 2: ' 


We have explicit simple representations of the exact densities for the special cases р = 
l, p=2, t = 2, t = 3. However, our situation being p = 3, t = 4, they do not apply. А 
chisquare approximation is available for large values of rst, but our rst is only equal to 24. 
In this instance, —21n à — Ж ры 6—1) 2 xé as rst — oo. However, since the observed 
value of —2InA = 2.8508 happens to be much smaller than the critical value resulting 
from the asymptotic distribution, which is X6 0:08 = 12.59 in this case, we can still safely 
decide not to reject the hypothesis Н, : Ij; = О for alli and j, and go ahead and test for 
the main row and column effects, that is, the main effects of medical combinations Med- 
1 and Med-2 and the main effects of exercise routines Ex-1, Ex-2 and Ex-3. For testing 
the row effect, our hypothesis is A; = A» = О and for testing the column effect, it is 
Bı = B» = Вз = О, given that Г; = O for all i and j. The corresponding likelihood 
ratio criteria are respectively, 


rst 


S F S 
ESI | res | |? aiid ha = ( | res | js 
[Sres F Srowl |8,25 ag Scol| 


2. 
and we may utilize wj = А 2 , J = 1, 2. From previous calculations, we have 


104 35 8 24 —24 0 128 11 8 
Sres + Srow = | ЭЭ 80 6 |+|—24 24 0|] 11 104 6j, 
8. 6 52 0 0 0 8 6 52 
104 35 8 4 -4 4 108 31 12 
Sres + Scot =| 35 80 6 |+|—4 4 –4| = | 31 84 2 


8&8 6 52 4 -4 4 12 2 56 
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The required determinants are as follows: 


[Sres] = 363436, [Sres = Srow| = 675724, [Sres ES Scol | = 443176 > 
363436 363436 


ш = aq = 0.5378468 апі w2 = 27 = 0.8200714, 
—2In A, = —241n0.5378468 = 24(0.62018) = 14.88, 
XŽg-1),a = X&005 = 7.81 < 14.88; (i) 
—2 1n A; = —241n0.8200714 = 24(0.19836) = 4.76, 
Х,а = X60.05 = 12.59 > 4.76. (ii) 


When rst — oo,—2lnÀ| — doen апа 2А — Ati referring to Exer- 
cises 13.5.9 and 13.5.10, respectively. These results follow from the asymptotic expan- 
sion provided in Sect. 12.5.2. Even though rst = 24 is not that large, we may use these 
chisquare approximations for making decisions as the exact densities of шу and w» do not 
fall into the special cases previously discussed. When making use of the likelihood ratio 
criterion, we reject the hypotheses A; = A» = О and By = B2 = Вз = О for small 
values of A; and 25», respectively, which translates into large values of the approximate 
chisquare values. It is seen from (i) that the observed value —2 In A, is larger than the tab- 
ulated critical value and hence we reject the hypothesis A; = Az = О at the 5% level. 
However, the hypothesis В = B2 = Вз = О is not rejected since the observed value is 
less than the critical value. We may conclude that the present data does not show any evi- 
dence of interaction between the exercise routines and medication combinations, that the 
exercise routine does not contribute significantly to bringing the subjects' initial readings 
closer to the standard values (120, 80, 60), whereas there is a possibility that the medical 
combinations Med-1 and Med-2 are effective in significantly causing the subjects’ initial 
readings to approach standard values. 


Note 13.5.1. It may be noticed from the MANOVA table that the second stage anal- 
ysis will involve one observation per cell in a two-way layout, that is, the (7, j)-th 
cell will contain only one observation vector X;; for the second stage analysis. Thus, 
Ssub = Sint + Srow + Scol (the corresponding sum of squares in the real scalar case), and in 
this analysis with a single observation per cell, Sint acts as the residual sum of squares and 
cross products matrix (the residual sum of squares in the real scalar case). Accordingly, 
"interaction" cannot be tested when there is only a single observation per cell. 
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Exercises 


13.1. In the ANOVA table obtained in Example 13.5.1, prove that (1) the sum of squares 
due to interaction and the residual sum of squares, (2) the sum of squares due to rows and 
residual sum of squares, (3) the sum of squares due to columns and residual sum of squares, 
are independently distributed under the normality assumption for the error variables, that 
: iid 2 2 

is, ерк ~ N1(0, 0^), of > D. 


13.2. In the MANOVA table obtained in Example 13.5.1, prove that (1) Sint and Sres, 


(2) S-ow and Sres, (3) Scot and Sres, are independently distributed Wishart matrices when 
Eijk №0, X), E > О. 


13.3. In a one-way layout, the following are the data on four treatments. (1) Carry out 
a complete ANOVA on the first component (including individual comparisons if the hy- 
pothesis of no interaction is not rejected). (2) Perform a full MANOVA on the full data. 


ve Г. mn Г] 
vem (3) E] D] B] mmm DE ES] 


13.4. Carry out a full one-way MANOVA on the following data: 


1 1 2 1 0 3 4 6 5 
Treatment-1 Ol,Il1l, l1|,|21/|,12[|,Treatmen-2 }2],/2]/,/5],]6], 
—1 1 —1 1 1 3 5 4 5 
1 —1 1 —2 
Treatment-3 1],/ 1], 11,110, 1 
—1 1 0 2 2 


13.5. The following are the data on a two-way layout where A;; denotes the data on the 
i-th row and j-th column cell. (1) Perform a complete ANOVA on the first component. (2) 
Carry out a full MANOVA on the full data. (3) Verify that Sow + Scol + Sint = Ssup and 
Ssub + Sres = Stot, (4) Evaluate the exact density of w. 


2 1 2 =] 1 3 X d 1 —1 1 -1 

Ап= 121 0, Ар= 4 4 11, Ag= -2 12 1], 
-1 3 1 4 6 3 —1 0 1—2 1 —2 
з 42 3 1-1 1 = 2 3 30 

An = 2 -123,An= 0 1 -1 пам 3 2 —1 2. 
3 522 1 1 1 1 3 =3 24 
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13.6. Carry out a complete MANOVA on the following data where A;; indicates the data 
in the i-th row and j-th column cell. 


1 —1 1 1 1 43422 5354 5 
Anco y 444999855 55]995—5 249 45 

01 1 0-2 6 764 5 1 01 -1 -I 
A5 2 0 ep -142345234 48331] 12 1 5f 
13.7. Under the hypothesis Ау = --- = A, = О, prove that U1 = (Sres + Srow) 3 Sres 
(Sres + A is a real matrix-variate type-1 beta with the parameters (280—0, rt) for 


r > p, rs(t — 1) = p, when the hypothesis Г;; = О for all i and j is not rejected or 
assuming that Jj; = О. The determinant of U; appears in the likelihood ratio criterion in 
this case. 


13.8. Under the hypothesis B; = --- = B, = О when the hypothesis Г; = О is not 
rejected, or assuming that Г; = О, prove that U2 = (Sres + S As + 8.0)? 
is а real matrix-variate type-1 beta random variable with the parameters (7800, st) for 
s > p, rs(t — 1) > p. The determinant of U2 appears in the likelihood ratio criterion for 
testing the main effect B; = О, j = l,...,s. 


13.9. Show that when rst — оо, —2 ln à; > acus that is, —21n A, asymptotically 


tends to a real scalar chisquare having p(r — 1) degrees of freedom, where А = |U;| and 
U, is as defined in Exercise 13.7. [Hint: Look into the general h-th moment of A, in this 
case, which can be evaluated by using the density of U1]. Hence for large values of rst, 
one can use this approximate chisquare distribution for testing the hypothesis A; = --- = 
A, = О. 


13.10. Show that when rst — oo, —21n2» > Ж regn that is, —2 ln A» asymptotically 
converges to a real scalar chisquare having p(s — 1) degrees of freedom, where A2 = |U2| 
with U2 as defined in Exercise 13.8. [Hint: Look at the h-th moment of A5]. For large values 
of rst, one can utilize this approximate chisquare distribution for testing the hypothesis 
BS eS В, = О. 
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Chapter 14 A 
Profile Analysis and Growth Curves т 


14.1. Introduction 


We will utilize the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital let- 
ters X, Y,... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of let- 
ters such as X, y, X , Ү to denote variables in the complex domain. Constant matrices will 
for instance be denoted by A, В, С. A tilde will not be used on constant matrices unless 
the point is to be stressed that the matrix is in the complex domain. The determinant of 
a square matrix A will be denoted by |A| or det(A) and, in the complex case, the abso- 
lute value or modulus of the determinant of A will be denoted as |det(A)|. When matrices 
are square, their order will be taken as p x p, unless specified otherwise. When A is a 
full rank matrix in the complex domain, then AA* is Hermitian positive definite where 
an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will 
indicate the wedge product of all the distinct differentials of the elements of the matrix X. 
Thus, letting the p x q matrix X = (хуу) where the x;;’s are distinct real scalar variables, 
dX — А, ^= dx;;. For the complex matrix Х = Xı +iX2, i = J(—1), where X4 


and X» are real, dX — dX, лах». 
14.1.1. Profiles 


The two topics that are treated in this chapter are profile analysis and growth curves. 
We first consider profile analysis. Only three types of profiles will be considered. As an 
example, take the amount of money spent at a specific grocery store by six families residing 
in a given neighborhood. Suppose that they all do their grocery shopping once a week 
every Saturday. Let us consider the sum spent on the average in the long run on four types 
of items: item (1): vegetables; item (2): baked goods; item (3): meat products; item (4): 
cleaning supplies. Let ш;; denote the expected amount or the average amount spent in 
the long run by the j-th family on the i-th item, j = 1,2,3,4,5,6, and i = 1,2,3,4. 
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The monetary values of these purchases are plotted as points in Fig. 14.1.1. In order to 
see the pattern, these points are joined by straight lines, which are not actually part of the 
graph. From the pattern or profile, we may note that families F1, F2, F3, F4 have parallel 
profiles, that the profiles of F; and F4 coincide, and that Fs has a constant profile or is 
horizontal with respect to the base line profile. The profile of Fg does not fall into any 
category relative to the other profiles. 


40 
F1&F4 
F2 
F3 

30} 
F5 
F6 

20} 


| i ; Item 
1 2 3 
Figure 14.1.1 Profile plot for 6 families: amounts spent weekly on 4 items 


In this instance, we refer to the the profiles of F1, F2, F3, F4 as “parallel profiles" where 
the profiles of Fı and F4 are “coincident profiles". F5 is said to have a constant profile or 
“level profile". We now study these patterns in more details starting with parallel profiles. 


14.2. Parallel Profiles 


Let uij be the expected dollar amount of the purchase of the j-th family on the i-th 
item. In general, if the items are denoted by the real scalar variables x1;, ..., xpj, then 
шу = E[xij] i = 1,..., pand j = 1,...,k. Thus, we have k, p-variate populations. 
The items or variables may be quantitative or qualitative. The p items may be p different 
stimulants and the response may be the expected enhancement in the performance of a 
sportsman; the p items could also be p different animal feeds and the response could then 
be the expected weight increase, and so on. The ш;;'ѕ, that is, the expected responses, 
may be quantitative, or qualitative but convertible to quantitative readings. If the items are 
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various color combinations in a lady’s dress, the response of the first observer Fı with 
respect to a certain color combination may be “very pleasing", the response of the second 
observer Р» may be "indifferent", etc. These assessments of color combinations can, for 
example, be quantified as readings on a 0 to 10 scale. 


Let us start with k — 2, that is, two p-variate populations, and let the responses be 


HM H2 

H21 M22 
M, = : and М» = " 

Mpl Mp2 


Then, what is the meaning of parallel profiles? We can impose the conditions 4412 — M11 = 
U22 — H21, H22 — M21 = u32 — H31, etc., which is tantamount to saying that 


Mj-12 — Mj-11 = Mj2 —Mji > Mj — Mj-11 = Mj2 — Mj-12, for j =2,..., р. 
(14.2.1) 
The conditions specified in (14.2.1) can be expressed in matrix notation. Letting C be the 
following (p — 1) x p matrix: 


-1 1 0... 00 

—H + Hun —H12 + Ш22 
0-1 1 .. 00 

Ha t ua — 22 + u32 
0... 0-1 10 : 
0 0 0 -1 1 | [ьт +H | Ee 


(14.2.2) 
Then, CM, = CM). Therefore, “parallel profiles” means CM, = СМ», and we can test 
the hypothesis CM; = C M» against CM; zz СМ» for determining whether the profiles 
are parallel or not. In order to avoid any possible confusion, let us denote the two vectors 
as X and Y, and let Му = E[X] and М» = ET[Y |], that is, 


xi yı шп Ш 
X2 2 H21 M22 

xal ra xrsM | |, Bree emus 
Xp Yp Mpl Mp2 


For testing hypotheses, we need to assume some distributions for X and Y. Let X and Y be 
independently distributed real Gaussian p-variate variables where X ^ N,(Mj, X) and 
Y ~ N5(M», У), У > О. Observe that the (p — 1) x p matrix C is of full rank p — 1. 
Then, on applying a result previously obtained in Chap. 3, 


CX ~ N, 41(CMi, CXC’) and CY ~ N, (CM2, CXC’). (14.2.4) 
p p 
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Assume that we have a simple random sample of size n; from X and a simple random 
sample of size n? from Y. Let X1,..., Xn, be the sample values from X and let Y1, ..., Yn, 
be the sample values from Y. Let the sample averages be X and Y, the sample matrices be 
the boldfaced X and Y, the matrices of sample means be X and Y, the deviation matrices 
be X4 and Yq and the sample sum of products matrices be 5; = XX, and $5, = ҮҮ’. 
Then, these are the following: 


X11 X12... Xim 
X21 X22 ... Хп 
X = [ X1, Ка . . А . , 
\хы Xp2 «++ хн | 
n] = 
Ў р/т X1 
E jn 227/11 X2 
Х = . =|. Jb 
п] | = 
jai Xp /M1 Xp 


X = [Х, X,..., X, X 2 X- X, S; = ХХ, 


with similar expressions in the case of Y for which the sum of squares and cross products 
matrix is denoted as $5 — ҮҮ’. Note that X ~ М№(М], =X), Уь 0 СХ ~ 
Np- (CM), бис As well, CY ~ Np—1(C M5, CEC’). The sum of squares and 
cross product matrices of CX and CY are CS,C’ and С.52С', respectively. Since X and Y 
are independently distributed, we have 


CX — CY ~ Ny (COMi — М), (È + $) XC). (14.2.5) 


An unbiased estimator for X is © = 15565 1 + $5) and then, Hotelling’s T? statistic 
is 

T? = (2+ Ly'(x-Yyctczc СХ — Ӯ). (14.2.6) 
The usual test is based on Hotelling's T? statistic. However, referring to Sects. 6.3.1 
and 6.3.3, we are making use of a statistic based on a real type-2 beta random vari- 
able. Thus, for nı + no — p = (nı + no —2) + 1 — (p — 1), and under the hypothesis 
Ao : CM, = CM), 


ш = [ERP + EK — Y'C'(CG1 + $2)0) 1 (X — Y) Foie p 
(14.2.7) 
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Accordingly, for testing the hypothesis Ну : CM; = СМ» versus CM1 4 СМ», we reject 
Но if the observed value of w of (14.2.7) is greater than or equal to Fy—1 jn) 4n.—p,a, the 
upper 100 a@% percentage point from an Fp—1,n1+n2—p distribution. 


Example 14.2.1. The following are independent observed samples of sizes nı = 4 and 
n2 = 5 from trivariate Gaussian populations sharing the same covariance matrix X > О. 
Test the hypothesis Ho : CM; = СМ» at the 5% significance level: 


1 1 1 1 
Observed sample-1: Х = 0|, X2 |1|, X3=]-1], X4210]|; 
—1 1 2 2 
0 1 —1 1 —1 
Observed sample-2: Y =|—1|,У»=|1|,»= 21, й = 2 |, Ү5 = 1 
—1 2 1 —2 0 


Solution 14.2.1. Let us first evaluate the 2 x 3 matric C as well as X, Y, X, X, Y, Ху, 
Y4, Sı and 5: 


І 
І 
0 
0 -1 4 0-1 "P 
6 1],S+%=] 08 0 eS, +5\с' =| 1. 
1 
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24 —17 —7 
4-1 1 24 7 1 4-1 1 
[CSi + 5С] "mL ‚ Сїс(з+5)С]| С=—|—17 22 —5 |, 
239 7 12 239 
7 —5 12 
= | 1 1\-1_ 20 
i и Е i 5 E 9 


Therefore, 


qu UE) ( FN )« — YYC'[C(5; + $)C']-! c (X — Y) 


р— 1 n| n» 
КҮЛНЕ ы 
~ 24(9)(239)/ 7 =F -5 19 1 

20 
ВЕ = 2.45. 

(mom) 


From the tabulated values for a = 0.05, 
Ер-1,пі+п2-р,а = Р 6,0.05 = 5.14 > 2.45. 
Hence, the hypothesis Ну : СМ, = СМ» is not rejected at the 5% significance level. 


14.3. Coincident Profiles 


With the notations employed in Sect. 14.2, two profiles are coincident if Lj; = 
шо, i = 1,..., p, which amounts to assuming Ho» : Mı = М. If the p x 1 real 
vectors X and Y are X ~ N (Mı, X) and Y ~ №(М, X), X > О, and the populations 
are independently distributed, this is the test for equality of mean value vectors in two 
independent Gaussian populations with common unknown covariance matrix, which was 
discussed in Chap. 6 where a Hotelling's T? is usually utilized for testing this hypothesis. 
We will use a statistic based on a real scalar type-2 beta variable, which has already been 
considered in Sects. 6.3.1 and 6.3.3. Given samples of sizes n; and n» from these normal 
populations, and letting X and Y be the sample means and 5, and Sz be the sample sum 
of squares and cross products matrices, the test statistic under the hypothesis H2, denoted 
by ш, is the following: 


ni +n — l — 1 Db E -_1-s - 
ш = | | + ) (X — Y) (Si + $2) (X — Y) ~ Еьү+пә—1—р- 
р П] п? 
(14.3.1) 
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Thus, we reject the hypothesis of equality of population mean values or the hypothe- 
sis Но : Mı = М», whenever the observed value of ші] is greater than or equal to 
Fy nin3—1— p, o, the upper 100 0% percentage point of an F-distribution having p and 
nı +n — 1 — p degrees of freedom. 


Example 14.3.1. With the data provided in Example 14.2.1 and under the same assump- 
tions, test the hypothesis that the two profiles are coincident, that 15, test H5» : Mı = М». 


Solution 14.3.1. From Solution 14.2.1, we have: n, = 4, no = 5, 


_ ip. 0 0 0 0 4 0 -I 
X-2|0|,Y-2|1|, $210 2 -1],$2| 06 1|, 
1 0 0-1 6 —] 1 10 
so that 
4 0 -1 | [128 0 8 
51+59=| 0 8 0 , (+57 = xz 0 63 0 |, p=3, m4+n2-l-p=5. 
=i 0 16 504} 8 о 32 


Then, an observed value of the test statistic, again denoted by ш, is the following: 


ni +n —1l— 1 1 
Qn 2 P( > 


eL E - = 
: ) &-P st Р) 


Ш] 
П] n2 


128 0 8 1 


5\/20\ 1 
-GGe о e о|]- 


= (2)(=)(=) _ 23900 — 176 
3/09 4504/ 13608 " 
The tabulated value of Ёрпү+пә—1—р‚« = F3,5,0.05 being 5.41 > 1.76, the hypothesis 


Мү = М» against the natural alternative Mı 52 М» is not rejected at the 5% level of 
significance. 


14.3.1. Conditional hypothesis for coincident profiles 


If we already know that the two profiles are parallel, then the second profile is either 
above the first one or below the first one, and these two profiles are equal only if the sum of 
the elements in M, is equal to the sum of the elements in M». There should be caution in 
making use of this argument. We must know that the two profiles are assuredly parallel. If 
the hypothesis that two profiles are parallel is not rejected, this does not necessarily imply 
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that the two profiles are parallel. It only entails that the departure from parallelism is not 
significant. Accordingly, a test for coincidence that is based on the sum of the elements 
is not justifiable, and the test being herein discussed is not a test for coincidence but only 
a test of the hypothesis that the sum of the elements in M, is equal to the sum of the 
elements in M». Observe that this equality can hold whether the profiles are coincident 
or not. The sum of the elements іп M, is given by // М! and the sum of the elements in 
Мэ is given by J'M» where J’ = (1, 1, ..., 1). Under independence and normality with 
X ^ Ny(M;, X) and Y ~ N,(M»5, У), X > О, we have J'X ~ N(U'Mi, x 4I) 
and J'Y ~ МСМ, 1 J' X J), and hence, 


nin» 


= (ny cn -2( Ju'& - Y) = Jat = M3 


nj + n2 
x [J (S1 -.$2)J] ![J'(X — Y) — JM, — M2)] (14.3.2) 


where, under the hypothesis Ноз : JM, — Ј' Мә = 0, t is a Student-t having n; + n2 — 2 
degrees of freedom and 12 ~ Е 1.лу+по 2. Thus, we may utilize a Student-r statistic with 
nı -- n» – 2 degrees of freedom or an F-statistic with 1 and nı +n 2 — 2 degrees of freedom 
in order to reach a decision with respect to the hypothesis of the equality of the sum of the 
elements in M, and M2. 


Example 14.3.2. With the data provided in Example 14.2.1, test the hypothesis JM; = 
Ј' Мэ or the equality of the sum of elements in M, and M2. 


Solution 14.3.2. Inthiscase,ni4-n;—2 = 7, "2 = 2 


^ nin; — 20? 
E 1 4 0 -l 1 
PO Yy sendet ueiJG Jg SK | оба 330026, 
1 —1 O 16 1 


Hence, the observed value of t? in (14.3.2) is 


2 = (=) (33) = 0.121 while Р т.о.оз = 5.596.61 > 0.121. 


Thus, the hypothesis of equality of the sum of the elements in M; and M3 is not rejected 
at the 5% significance level. 


14.4. Level Profiles 


A level profile occurs when the plot is horizontal with respect to the base line or when 
all the elements in M; are equal. Thus, this hypothesis involves only one population. If 
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there are two populations, we can consider the hypothesis of the mean value vectors М 
and M» being equal or testing whether the sum of the elements in each of M; and M2 
are equal. When the null hypothesis is not rejected, the sum in M; may be different from 
the sum in М». However, if we assume the profiles to be coincident, these two sums will 
be equal. We may then create a conditional hypothesis to the effect that the profiles in 
the two populations are level, given that the profiles are coincident. This is how one can 
connect the two populations and test whether the profiles are level. Let the p x 1 vector 
Xj ^ N,(M, У), X > O,andlet X4,..., X, be iid as Xj. In this instance, we have one 
real p-variate Gaussian population and a simple random sample of size n with E[ X ;] = М 
and Cov(X;) = 2, j = 1, ...,n. Let 


ш xij xl У 5-1ху/п 
= n 
Me H2 X= Х2] Ж — № | 2 j=1 х0ј/п 
= : , J= : , as : — s 
Hp Xpj Xp У xpj/n 
The hypothesis of “level profile" means Ноз : ш = U2 = ··· = шр = H for some u. The 


common value и is estimated by д = n xm J у xij, which is the maximum likeli- 


hood estimator/estimate. Then, under the hypothesis, M is such that M’ = [ie dees ene À]. 
Testing such a hypothesis has already been considered in Chap. 6, the test being based on 
the statistic 

u = n(X — My'S-K(X — М) ~ type-2 beta (144.1) 
with the parameters (s. = ), under the hypothesis Ноз, referring more specifically 
to the derivations related to Eq. (6.3.14). Thus under Ho3, 


(a— p+ 1) 
"esp ^ И ^ Гр—1,п—р+1 (14.4.2) 
where Fp—1,n-p+1 15 а real scalar F-random variable with p — 1 and n — p + 1 de- 
grees of freedoms. We reject Ноз if the observed value of the Fp—1,n-p+1 as specified 
in (14.4.2) is greater than or equal to РЁ n—p+1,a, the upper 100 0/26 percentage point of 
an Fp—1,n-p+1 distribution. This test has already been illustrated in Chap. 6. 


14.4.1. Level profile, two independent Gaussian populations 


We can make use of (14.4.1) and test for level profile in each of the two populations 
separately. Let the populations and simple random samples from these populations be as 
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follows: X e N (Mı, X), X > О, j = 1, nj, and Y; < №(М, X), X > 
О, j= 1,..., по, where X is identical for both of these independent Gaussian popula- 
tions, and let 


Hil #12 
E[Xj]= Mi = es (J= kannn ЕД 2 М = rS н 
Шрі бо 
When population X has a level profile, ш = ш = ::: = цы = ш“) and when 
population Y has a level profile, 442 = 422 = ··· = цр = NI и) need not Бе equal to 


и). However, if we assume that the two populations have coincident profiles, then ш) = 
и). In this instance, the two populations are identical, so that we can pool the samples 
X4,..., Xn, and Ү,..., Yn, and regard the combined sample as a simple random sample 
of size nı + n? from the population N,(M, X), X > О. Thus, we can apply (14.4.1) 
and (14.4.2) with n is replaced by nı + n2, in order to reach a decision. Instead of letting 
u® = u® = u for some ш and computing the estimate for u ог Ô from the combined 
sample and utilizing M, М' = [À, ..., 2110 (14.4.1), we can make use of the matrix C as 
defined in Sect. 14.2 and consider the transformed vector CZ;, Z; ~ №(М, X), X > 
О, j = Ll...,ni + no, and use CZ, Z = yl Xi +: -+ Xn + Yi +--+ Ys] 
in (14.4.1). One advantage in utilizing the matrix C is that C М = O, and from (13.4.1), 
this M will vanish, that is, X — M in (14.4.1) will become C Z. Then, the test statistic 
under the hypothesis Нз : C'M = O, again denoted by u, becomes the following: 


и = (ni + n2)(CZ — O)'(CS,C’) (CZ — О) 

= (nj + п) C'[(CS.C^) !]CZ ~ type-2 beta (14.4.3) 

with the parameters (25- : je under the hypothesis Нз, where 5, is the sample 

sum of squares and cross predicts matrix from the combined sample. Then under Нз, 
(nı+n2- p+!) 

p-1 
We reject the hypothesis that the profiles in the two populations are level, given that the 
two populations have coincident profiles, if the observed value of the F distributed statistic 


given in (14.4.4) is greater than or equal to Fp—1,n1+n2-p+1,a, the upper 100 0/26 percent- 
age point of a real scalar Fp—1.n,4n.—p+1 distribution. 


~ 


p—1,n\+n2—p+l- (14.4.4) 


Example 14.4.1. With the data provided in Example 14.2.1, test the hypothesis of level 
profile, given that the two populations have coincident profiles. 
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Solution 14.4.1. In this case, we pool the observed samples: n; + n2 = 4 + 5 = 9, 


Denoting the pooled sample by a boldfaced Z, the matrix of pooled sample means by Z, 


the deviation matrix by Z; — Z —Z and the sample sum of products matrix by 
we have 


(i. "bez ДЕ uc 
PE) О 0р A Go «82. Чу, 

cq. ORS: X 30 

| 5 5. сб. 4 =ф Ses © (5 шуа 
Zjcxc| 8040548 ш A Be d$ 3. 


—15 5 14 14—13 14 5 —22 —4 


504 —180 99 
747 = 9 —180 828 —180 | =S, C= | 
99 —180 1476 
1 1692 —1287 _1 188 —143 
—1287 2664 | 9| –143 296 |’ 


-] 10 
0 —1 1 


296 —153 
9 9 


= 1 " - 
u= Р (nı +n) ALOK (ORE Gi iat OZ. 
== 


= Ов) Ges E E us | | : | = 


—143 —45 188 4 


Since, under the null hypothesis Ноз, F 


S ea. 


—143 
—45 |, 
188 


0.0197. 


р—1,лу+пә—р+1 = F2,7 and from the tabulated val- 


ues for a = 0.05, we have F2,7,0.05 = 4.74 which is larger than 0.0197, the hypothesis Ноз 
of level profile in both the populations, given that the population profiles are coincident, is 


not rejected at 5% level of significance. 
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14.5. Generalizations to k Populations 


Let the p x 1 real vectors X,,..., Ху be independently distributed where X; ~ 
N,(M;, X), X > О, j =1,...,k, X being the same for all the populations. Consider 
the (p — 1) x p array C, as previously defined in Sect. 14.2, whose rank is p — 1. Letting 
Z; = CX;j, j =1,...,k, Zj 2и М›-1\(СМ},СУС”), (СУС) > О, j= Ls 
Then, testing for “parallel profiles" in these k populations is equivalent to testing the hy- 
pothesis 

Ao : СМ, = СМ = ··· = СМ. (14.5.1) 


Testing for “coincident profiles" in all the k populations is tantamount to testing the hy- 
pothesis 
Ho» E М, = М2» == ж == My. (14.5.2) 


Testing for “level profiles", given that the profiles are coincident, in all these k populations, 
is the same as testing the hypothesis 


Нз: CM, = CM2 =---= CM, = О. (14.5.3) 


Note that H1, Нә, Ноз are all tests involving equality of mean value vectors in multivari- 
ate populations where, in Н, and H,3,, we have (p — 1)-variate Gaussian populations 
and, in Но, we have a p-variate Gaussian population. In all these cases, the population 
covariance matrices are identical but unknown. Such tests were considered in Chap. 6 and 
hence, further discussion is omitted. In all these tests, one point has to be kept in mind. 
The units of measurements must be the same in all the populations, otherwise there is no 
point in comparing mean value vectors. If qualitative variables are involved, they should 
be converted to quantitative variables using the same scale, such as changing all of them 
to points on a [0, 10] scale. 


14.6. Growth Curves or Repeated Measures Designs 


For instance, suppose that the growth (measured in terms of height) of a plant seedling 
(for instance, a variety of corn) is monitored. Let the height be 10 cm at the onset of the 
observation period, that is, at time т = 0, and the measurements be taken weekly. After 
one week (t = 1), suppose that the height is 12 cm. Then 10 = ао, and, for a linear 
function, 12 = ao + aıt = ag + ај at = 1, so that ао = 10 and ај = 2. We now 
have the points (0, 10), (1, 12). Consider a straight line passing through these points. If 
the third, fourth, etc. measurements taken at t = 2, t = 3, etc. weeks, fall more or less on 
this line, then we can say that the expected growth is linear and we can consider a model 
of the type E[x] = ao + ait to represent the expected growth of this plant, where E[-] 
represents the expected value. If the third, fourth, etc. points are increasing away from the 
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straight line and the behavior is of a parabolic type then, we can put forward the model 
E[x] = ao + ait + aat?. If the behavior appears to be of a polynomial type, then we may 
posit the model E[x] = ao +ajt+---+ ањ 10-1, However, we need not monitor the 
growth at one time unit intervals. We may monitor it at convenient time units t1, t2, .... А 
polynomial type model leads to the following equations: 


Е[х1] = ao + aiti + ant, +: + amit 


m-—1 


E[x2] = ao + aito + ants +++ + amat, 


т—1 


E[xp] = ao + аі + ant, +++: án 


where E[xi], ..., E[xp] are expected observations at times fj, f2,..., tp, and do is the 
value of the expected observation at t; = 0, j = 1,..., p. Thus, in matrix notation, we 
have a p-vector for the observations. Letting X, = [x1, X2, ..., X pl. 

X1 ti A ie ао 

2 т—1 
х2 l do 1... 1 ai 
рел Ек (ele x ‚ | STA, (14.6.1) 
33 ld due t] ped 


where X, is px1, T is p xm and the parameter vector A, is m x 1. In this instance, we have 
taken observations on the same real scalar variable х at p different occasions fj, ... , tp. In 
this sense, it is a "repeated measures design" situation. We have monitored x over time in 
order to characterize the growth of this variable and, in this sense, it is a "growth curve" 
situation. This x may be the weight of a person under a certain weight reduction diet. The 
successive readings are then expected to decrease, so that it is a negative growth as well as 
a repeated measures design model. When x represents the spread area of a certain fungus, 
which is monitored over time, we have a growth curve model that depends upon the nature 
of spread. If x is the temperature reading of a feverish person under a certain medication, 
being recorded over time, we have a growth curve model with an expected negative growth. 
We can think of a myriad of such situations falling under repeated measures design or 
growth curve model. 
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14.6.1. Growth curve model 


The model specified in (14.6.1) can be expressed as follows: 


LH ud 
10 1 t 2 т! 
_ = : _ 2 о ++ %у 
Е[Хү|= TA, Ai = : Jy]. l1 o4 І (14.6.2) 
inel 1 » 2 gm 


where we have a p-vector X, with expected value T A1, T being p x m and Aj, m x 1. 
If we are taking a simple random sample of size n; from this p-vector , we may denote 
the sample values as X11, X12, ..., Xin; and we may say that we have a simple random 
sample of size nı from that p-variate population or from n; members belonging to a group 
whose underlying distribution is that of Х|, and we may write the model as 


E[Xij]] 2 TAi, j = 1... m. (14.6.3) 


With reference to the weight measurements of a person subjected to a certain diet, if the 
diet is administered to a group of nı individuals who are comparable in every respect such 
as initial weight, age, physical condition, and so on, then the measurements on these nı 
individuals are the observations X11, ..., X1,,. Letting the same diet be administered to 
another group of n? persons whose characteristics are similar within the group but may be 
different from those of the first group, their measurements will be denoted by X2;, j = 
1,..., n2. Consider r such groups. If these groups are men belonging to a certain age 
bracket and women of the same age category, r — 2. If there are 7 groups of women of 
different age categories and 5 groups of men belonging to certain age brackets in the study, 
r will be equal to 12, and so on. Thus, there can be variations between groups but each 
group ought to be homogeneous. If there are r such groups and a simple random sample 
of size n; is available from group i, i = 1,...,r, we have the model 


ЕСЕТА = lye = 1,...,r. (14.6.4) 


We may then compare this situation with a one-way classification MANOVA (or ANOVA 
when a single dependent variable is being monitored) situation involving r groups with 
the i-th group being of size is п;, so that n = nı +n2+---+n; = n, as per our notation 
with respect to the one-way layout. There is one set of treatments and we wish to assess 
the expected effects in these r groups. For the analysis of the data being so modeled, we do 
not explicitly require a subscript to represent the order p of the observation vectors (in this 
case, p corresponds to the number of occasions the measurements are made) and hence, 
our notation (14.6.4) does not involve a subscript representing the order of the vector X;;. 
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The model (14.6.4) can be generalized in various ways. As a second set of treatments, 
we may consider s different diets being administered to r distinct groups of n individuals 
each. “Diet” may exhibit another polynomial growth of the type bo + bit +---+Dm,t"™', 
where m need not be equal to m — 1. Then, within the framework of experimental designs, 
we have a multivariate two-way layout with an equal number of observations per cell. In 
this manner, we can obtain multivariate generalizations of all standard designs where the 
effects are polynomial growth models, possibly of different degrees, with p, however, 
remaining unchanged, p being the number of occasions the measurements are made. We 
can also consider another type of generalization. In the example involving different types 
of diets, these various types may be the same diet but in different strictness levels. In 
this instance, the different diets are correlated among themselves. In a two-way layout, 
there is no correlation within one set of treatments; interaction may be present between 
the two sets of treatments. When different doses of the same diet is the second set of 
treatments, correlation is obviously present within this second set of treatments. This leads 
to multivariate factorial designs with growth curve model for the treatment effects. In all 
of the above considerations, we were only observing one item, namely weight. In another 
type of generalization, instead of one item, several items are measured at the same time, 
that is, there may be several dependent (response) variables instead of one. In this case, 
if y1, ..., ук are the items measured on the same p occasions, then each y; may have its 
own distribution with covariance matrix X;, j = 1,...,k. Not only that, there may also 
be joint dispersions, in which case, there will be covariance matrices 27; for all i, j = 
1, ..., k. Thus, several types of generalizations are possible for the basic model specified 
in (14.6.4). 


In order to analyze the data with the model (14.6.4), the following assumptions are 
required: 


(1) The n, = n vectors X;;'s are mutually independently distributed for all i and j with 
common covariance matrix X > О. 


(2) E[X;] = TA; j=1,...,ni, foreach i, i = 1,...,r. 
(3) Xij ^ М, (ТА, X), X О, j=1,...,ni, for each i, i = 1,...,r. 


The normality assumption is convenient for obtaining maximum likelihood estimates 
(MLE) and for testing hypotheses on the parameters A;, i = 1,...,r. Observe that the 
matrix T is known. 
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14.6.2. Maximum likelihood estimation 
Under the normality assumption for the X;;'s, their joint density, denoted by L, is 
L= -—— -e Ag y TANE TAn, (14.6.5) 
(27)? |52 
Note that 
yl —1 / 
2 y - TAY E Xij - TA) = У ШУХ - TAD GG; — ТА). 
i,j i,j 
Let M; = T Aj. Then, differentiating In L with respect to M; and equating the result to a 
null vector yields the following (for vector/matrix derivatives, refer to Chap. 1): 


д _ 
з 2.085 - Муз (Xij - М) = О = 
1 i,j 


-2 $ 4E Qu — M] = 0 > 
j 


Y'Gy-M)-02M- =) x; —X,. (14.66) 
j J 


Thus, the MLE of M; is M; = X;. At M; = M;, 


_ тр п, 1 _1 а aor 
InL= —— IQ) = z mlz =; ке (Xi; = Xi)Xij — Xiy] 
В.р 

2 
where 5; is the i-th group sample sum of products matrix or sum of squares and cross 
products matrix fori = 1,...,r, see also Eq. (13.2.6). Letting S = S4 +---+ S, and 
proceeding as in Chap. 3 by differentiating In L with respect to 2—1, equating the result to 
a null matrix and simplifying, we obtain the MLE of 27 as 


(2л) — Zmz] = PRI ++ 5$,)] 


x 1 1 
pee ec ЕДРИТЕ Y (14.6.7) 
n. n. 


An unbiased estimator of X is — (Sj +---+ S,) = sS . For evaluating the MLE of 


n —r 


the parameter vector A;, we can proceed as follows: After substituting Ls to X in lnL, 
consider the following for a specific i: 
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д 
дА; 


д а үс! 
nL = 94; 2 Xi — ТА) (7) (Ху —TAi) = 0 
ij 


(>) УХХ; -ТА)= 05 
r 


T'S! X; — T'S TniA; =O > 
A eT a N (14.6.8) 
Note that this equation holds if S^! is replaced by Z^! or Г-! for some real posi- 


tive definite matrix Г. Thus, the maximum of the likelihood function under the general 
model (14.6.4) is the following: 


n.p 
x 
(2л) 2 |); (Ху - ХХ — Хо 
As well, observe that the matrix in the determinant of (14.6.9) is 
1 2 т? v / 1 . . . 
— b | 3208 — Xi)y(Xij — Xi) | = "1 x (the residual matrix in MANOVA) 
n 
‘j=l j=l * 


in a one-way layout, referring to Sect. 13.2. As well, this quantity is Wishart distributed 
with n, — r degrees of freedom and parameter matrix X, that is, 


x — X)(Xij — Xi)  Wy(n, — r, X), X > О. (14.6.10) 
ij 


Now, consider the MLE of A;: 
A; = (T'S 1T) IT'S! X; = С1Х;, Сү = (T'S | T) IT'S.. (14.6.11) 


Observe that n, cancels out in А;. In a normal population, the sample mean and the 
sample sum of products matrix are independently distributed. Hence, Xj; and S; аге in- 
dependently distributed, and thereby X; and S are independently distributed. Moreover, 
E[Xij] = E[Xi] = T Ai for each i, and we have 


E[À;] = E(T'S 1T) |T'S)E[X;] = EKT'S T) T'S!TA;] = Aj. 


Thus, A; is an unbiased estimator for A;. So far, we have not made any specification on the 
number т or the structure of T. We can incorporate the structure of Т and the specification 
on the degree of the polynomial in Т through a hypothesis. Let 
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l dog ie 
ШЕ И eign 

йткеш, | E (14.6.12) 
Luo di ee Oe 


for a specified m, that is, Н, posits that it is a polynomial ТА; of degree m — 1 for each 
i and this model fits. In this instance, the condition imposed through Н, is that the degree 
of the polynomial is a specified number m — 1 because p is known as the order of the 
population vector. Then, the factors e- > and (Ол)? cancel out in the likelihood ratio 
criterion A, leaving 


vie 
ьА)/|7 


L | ; OG; — Xi) (Xi; — Xi 
PATHS e Dij Xij / : (14.6.13) 


maxL |), (Xi; — ТАО — 
Note that 5^; (Ху — ToAi) = X; ОХ — Xi) + (X; — Т„А;)], so that 
Gu - TAD (Ху — TAD! = 5 (Ху - X006; — Xi 

i,j i,j 


+ GG — ТА) — ТА). 


Since A|H, is a ratio of determinants, the parameter matrix X can be eliminated, and we 
can assume that the normal population has the identity matrix / as its covariance matrix. 
As well, [X;; — Xi] = [(Xi; - TAi) — (X; - ТА;)] and [X; — T>A;] = [(X; — Т,А;) — 
(T Âi — Т,А;)]. We have already established that A; is an unbiased estimator for Aj. 
Hence, without any loss of generality, we can assume the population to be №,(О, Г) under 
Ho, and we can consider 


X; — T,(T,5 11) | T,S | X; = (I — B)Xi 
Xi ~ №00, L1) > JniXi  Ny(O, Г) 
where B = T,(T/S | Tj) | T/S-!. Now, since 
EDD 1) Sg sr) ps Tenra secus em 


B is idempotent of rank m where m is equal to the rank of 7%. For convenience, let 
pı = p — m and p = m, so that p; + p2 = p. Observe that Xj; — X; and X; are 
independently distributed, and that X; = L -J Ј'Х;; у and Xj; — Х = = [1 - l -J J' ]Xij 


Profile Analysis and Growth Curves 831 


where J’ = [1,1,...,1] or J is an n; х 1 vector of unities. Now, observing that 
[Z — +J J']- J J'] = O and that, under normality, ү/п;Х; ^ Np(O, I), it follows that 
Ф = >), (Xi — X;)(Xij — Xi) and U; = [U — B)X;][U — B)X;] are independently 
distributed since U; and X; are independently distributed. Accordingly, 


2 101] 


ш = Ап ——————, (14.6.14) 
1 + U;| 


Since J — B is idempotent and of rank p — m = pj, we can perform an orthonormal 

transformation on ./n;X; and, equivalently, write Uz as U2 > Е A in distribution 
iid 

where Z = [21,..., Z][Zi,-.., Zl, Zj ~ Np (O, Ip). Then, Z ~ Wp, (r, Ip,) for 

r > pj. Moreover, we can reduce the p x p matrix U; to a pı х pı submatrix in the 

determinant. Let us consider the following partition of U; where U41 is py x pı and (ә 


is p2 X po, p1 + pa = P: 


= _ E 
D TA [U1| = |U22| [U11 — U12U 55 О, 


which follows from results included in Chap. 1 on the determinants of partitioned matrices. 
Then, w of (14.6.14) is such that 


ee [Un — {әй U5i| 
IZ + Ui) — Um Us; Uil 


1 р 
Now, for fixed U22, make the transformation Vj? = UjU =» ау = | 7 dU12, 
referring to Chap. 1. Let us take ру > po; if not, simply interchange p; and po in the 
following procedure. Then, make the transformation 


"s а 
S = Vi2Vi5 => dVn = Pi | S| 2 2 dS, 
Dp, 05) 


the Jacobian being given in Sect. 4.3. Observe that U; has a Wishart density with n — r 
degrees of freedom and parameter matrix /, where n = n, = п +-+ + nr. Letting the 
density of U; be denoted by f(U1), we have 
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n- p+l 1 


ЛШ) = c|U,| 2 2 e 28D 
= zu _ i 
= д0]? — Ui2Uz; Uz Pe UDU, „л pil 


p i 
= c|Uz3|^* 01 = Viz Vja |P e7 219001) +0(022)] 


m 2 Pi poti n E. 
=E pi БЕ : |U22|^* 2 |0 — S|^e 5 [tr(U11)9-tr(U22)] 

Гь,(%) 

р1р2 

л 2 P1 P2 1 
Е Pi | S 

I») 

E — H [tr(S)+tr(V22)+tr(U. E 
x [UzP t? |Vo5|^ e 5 [tr S)--tr(Vo2) -tr( 22] Vis Ure Sie is бы, 


where c is the normalizing constant. Now, on integrating out U22 and S, we obtain the 
marginal density of V22, denoted by f1(V»2), as 


n—r р1+1 
fi(Va) = c2 Va £7 2 5 e 2) (14.6.15) 
where cz is the normalizing constant. This shows that V»? ~ Ж, (п — r — ро, I5) for 
n, —r — p > рі and that V22 and Z are independently distributed. Then, 


|... |Vol 
[Voo + Z| 


where Уә and Z are p; х p; real matrices that are independently Wishart distributed with 
common parameter matrix /,, and degrees of freedoms n, — r — p» and r, respectively. 
Then, w = |W| where W = (V2 + 2)-91550955 + Z)~2 is pı х рі real matrix-variate 
type-1 beta distributed with the parameters (a 5). Therefore, for an arbitrary Л, 
the h-th moment of the determinant of W is the following: 


Г n —r—p»b + h 
Тар ш. 
where c3 is the normalizing constant such that when h = 0, E [w^] = 1. Since the likeli- 
hood ratio criterion is А = w?7, 

Гр, (4 п") 
Г, ег a А a nh) 


ЕГА] = сз (14.6.17) 


As illustrated in earlier chapters, see for example Sect. 13.3, we can express the exact 
density of ш in (14.6.16) as a G-function and the exact density of A in (14.6.17) as an 
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H-function. Letting the densities of ш and A be denoted by f2(w) and f3(A), respectively, 
we have 


n.—p2 _j-l 


ML Jelus 
EL D | (14.6.18) 
P1 


n,—r—p» j-l 
2 


L, Jala 

p= =j+l j 

(eS), j=1,...,P1 
P1 


п.—т—р)—]+1 n А 
(RUM) 


ftw) = сз GP", E 


ВО) = ey HPI) [ (14.6.19) 
fr0 < w < 1 апа 0 < A < 1, and zero elsewhere, where the G and H-functions 
were defined earlier, for example in Chap. 13. For theoretical results on the G- and H- 
functions as well as applications, the reader is, for instance, referred to Mathai (1993) 
and Mathai, Saxena and Haubold (2010), respectively. We can express the exact density 
functions f2(w) and f3(A) in terms of elementary functions for special values of p, т and 
n. For moderate values of n, p and m, numerical evaluations can be obtained by making 
use of the G- and H-functions programs in Mathematica and MAPLE. For large values of 
n, asymptotic results may be invoked. 


Note 14.6.1. Тһе test statistic w can also be expressed as 


1 1 — 


1 
= = ——, W = V, ZZV, 
И +W 2 m^ 


=l =l 
|z + V! Z Vaz” | 
where №» is a real matrix-variate type-2 beta random variable with the parameters 
(=F, 5). As well, V? ~ Wp (n, — r — р, Ip), Z ~ Wp, (r, 1р), and V22 and Z 
are independently distributed. While Roy’s largest root test is based on the largest eigen- 
value of W2, Hotelling's trace test relies on the trace of W2. 


14.6.3. Some special cases 


In certain particular cases, one can obtain exact densities in terms of elementary func- 
tions. Two such cases will now be considered. 


Case (1) pp = m = p— 1 > pi=p-m=1,n.-r-pr.=n,.—r—ptl. Since 
pı = 1, there will be only one gamma ratio containing h, and 
ree! +h) n—p-r-4l 


‚ RA) > —— ; 
reap +h) 2 


E[w^] = сз 


© 


where c3 is the normalizing constant such that the right-hand side is 1 when h = 0. Observe 
that (i) is the h-th moment of a real scalar type-1 beta random variable and hence, w ~ 
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п—т— = r 


real scalar type-1 beta ( 5). Accordingly, y; = —* is a real scalar type-2 beta 


random variable with the Ee 5 ЕИ , SO did 
р 


n —r—p-41 
t = ; yi ~ Par —т—р+1› п—т—р-+1> 1. 
Thus, for large values of this F statistic, we reject the null hypothesis that the growth curve 
has the structure of T, of degree m — 1 as specified in (14.6.12) and that the growth curve 
model fits. Equivalently, we reject Н, when the observed value of t; is greater than or 
equal to Ку» —r—p+1,a, such that Ри{ Р. —r—p+1 = Fin —r—p+1,a} = © for a preselected 
a. 


Case (2) p = m = p — 2 = р = 2, which indicates that the h-th moment will contain 
two gamma ratios involving h. More specifically, 
pee + yr (ee -l +h) n.— p+ 


2 
E[w^] = сз 2 — thy > ———. (й) 
ree! + h r(t 1 р 2 


Since the arguments of the gamma functions in (ii) differ by І, we can combine them by 
applying the duplication formula for gamma functions, namely 


1 
Pr + 4) = ж22172Г(22). (iii) 


Consider E[/w]" so that h becomes 4. Now, by taking z = "75? za p + + 4 in the 
gamma functions containing h апа h = {Т in the constant пан we have 
Г(п. =ғ=р+1і+А) 
E{/w]" = 
[vVw]" = c31 reps ith 
where c3; is the corresponding normalizing constant. Thus, ,/w ~ real scalar type-1 beta 
with the parameters (n, — r — р + 1, г) forn, + 1 >r + p, and 


1 — yw n —r—p-4l 
Jw 


~ type-2 beta (r, n —r—p+!1), h = — r ~ Ро, әп -r- pl 

(iv) 
Accordingly, we reject H, for large values of the F statistic given in (iv) or equivalently, 
for an observed value of f2 that is greater than or equal to Р, 2(n—-r-p+1),æ, such that 
Pr(For2(n —r—p+1) = F2r2(i —r—-p44), a} = © for a given significance level о. 


y= 


Profile Analysis and Growth Curves 835 


14.6.4. Asymptotic distribution of the likelihood ratio criterion A 


Given the gamma structure appearing in (14.6.17), we can expand the gamma functions 
for (1 + А) — oo in the part containing h and 5 — оо in the constant part wherein 
h = 0, by using the first term in the asymptotic expansion of the gamma function or 
Stirling's formula, namely 


Гс +8) & / Qn)z3-3e7* (14.6.20) 


for |z| — oo, when ó is a bounded quantity. Then, on expanding each gamma function 
in (14.6.17) by making use of (14.6.20), we have 


ЕТАР > 4 п) 2700) = g[e 721205] > (1. 2А) 27 for 1 — 2h > 0, 
which shows that whenever n = n, =n, +---+n; > oo, 
A — х2, (14.6.21) 


where the x? is a real scalar chisquare random variable having rp; = r(p — m) degrees of 
freedom. Hence, for large values of n = п +- · · Ьп, ме reject the null hypothesis Н, that 


the model specified by To fits, whenever the observed A > X^ ni) a With Pr{ Y s > 


с. a} = a for a preselected о. 


Example 14.6.1. А certain exercise regimen is prescribed to a random sample of 5 men 
and 6 women to stabilize their weights to within normal limits. Except for being of dif- 
ferent gender, these men and women form homogeneous groups with respect to all factors 
that may influence weight, such as age and health conditions. The measurements are initial 
weight minus weight on the reading date. The weights are recorded every 5th day and a 
10-day period is taken as one time unit, the observations being made for 20 days or 2 time 
units. Test the fit of (1) a linear model, (2) a second degree growth curve to the following 
data, the columns of the matrix A being the readings on men, denoted by X;;, and the 
columns of the matrix B, the readings on women, denoted by Y;;: 


10 0-10 -1 1 0-110 
01-1 O00]. Е отр 

a = 0 1 =f 1 x i 0 —1 =f 110 = 
12 0 11 -1 —1 1 001 
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Solution 14.6.1. Let the sample average vectors be denoted by X for men and Y for 
women, the deviation matrices be denoted as X; and Y ; and the sample sum of products 
matrices, by Sı and S» respectively. Then, these items are as follows: 


0 10 0 -10 2 0 0 0 
- | 0 Е 01-1 00 Е П О С 
big leet | cag “acs ЕЕ SU MEME DHL 
1 01-1 00 0c 2r 2 
0 -1 10—110 4 1 -10 
NE BEST TS Ne jo Юу]: 3 
к= чу antag leap. cq pod yea | ayer: жщ 
0 -1 -1 1 001 0 1 04 


Let S = S, + $5. Consider the determinant S and its inverse Sl obtained from the matrix 
of cofactors as that matrix divided by the determinant since it is a symmetric matrix. Thus, 


6 1-1 0 

1 4-2 3 

S = 51 + 52 = 1 29 gab |5| = 580, 

0 3-1 6 

Е -38 6 о | Е -38 6 о | 
_ | -38 276 48 —130 -,_ l | -38 276 48 —130 

Ors 6 48 84 -10 |>" ~ 580 6 48 84 —10 | 
20 —130 —10 160 | | 20 —130 —10 160 | 


Take the growth matrices for the linear growth and quadratic growth as Тү and 7, where 


| 1/2 1 1/2 1/4 
1 1 jou. 
Tre o3). ШЕ Тот [305 дуд 
1 2 1 2 4 
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Now, let us compute the various quantities that are required to answer the questions: 


1 
rs"! = [s 156 128 | 


~ 580| 63 69 157 185 
= sil ME 
T"nsx- =| Rie Ir Tis '¥ = 555 | p |. 


580 | 185 580 
580 | 706 E] 


1 [416 474 e e a 
suo | am e 8 П) = 60020 | —474 416 


26390 54810 18270 mn | 


Ts n- 


1 32190 20590  —1450 
Ы ра “Йй ac |. 
MMS T) TiS = су) 8990 9570 22910 27550 
| 290 —13050 25230 56550 | 


We may now evaluate the estimates of the parameter vectors A; and A2, denoted by a hat: 
Д - 1 —59450 —0.861345 à 
_pple-iny-lpte-ly _ - Z 01 
Sure p. оро | 58000 | 7 | 0.840336 | B | ап | | 
A = 1 77430 1.12185 a 
се. / с 1 —lT/c-ly _ == = 02 
Ar = (15 1) 15 су | —45240 | = | —0.655462 | = | йр l: 


If the hypothesis of linear growth was not rejected, then the models for men and women, 
respectively denoted by №, (7) and fy(t) would be the following: 


Men: f;(t) = —0.861345 + 0.840336t Women: f(t) = 1.12185 — 0.655462 t. 


—30450 —0.441176 
1 —1450 | | —0.0210084 


As well, 


oe lo-l -—Imro-ly _ 
ТА: = Т(75 Т) Т5 X= 69020 27550 0.39916 
| 56550 | | 0.819328 | 
54810 0.794118 
Y Е 1 32190 0.466387 
Ка lol —-lT/c-ly _ — 
TjA2—TQ((485 Т) Т5 Y= 69020 9570 | | 0.138655 
— 13050 —0.189076 


Let us now compute the sum of products matrix under the linear growth model. The matrix 
X — ТуА} is such that the vector Tj A, is subtracted from each column of the observation 
matrix X. Letting 


838 


n, 
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GG, = Xi - ТАЈ) (Хуу ТА), with 


j=! 


СІ =Х – Т1А1 = 
| 2.97318 
‚ _ | 0.0463421 
GiGi =} 0.880499 
0.398542 


1.44118 0.441176 0.441176 —0.558824 0.441176 
0.0210084 1.02101 —0.978992 0.0210084 0.0210084 
—1.39916 —0.39916 0.60084 —1.39916 0.60084 |" 
0.180672 1.18067 —0.819328 0.180672 0.180672 
0.0463421 —0.880499 0.398542 

2.00221 —1.04193 2.01898 

—1.04193 4.79664 —1.36059 |^ 

2.01898 —1.36059 2.16321 | 


We now evaluate the same matrices for the second sample, denoting the deviation matrix 
by Сә and the sum of products matrix by СС, where Сә is obtained by subtracting T Az 
from each column of the observation matrix Y: 


G2 = Y — T1À; 
| —1.79412 


0.533613 
—0.138655 
| —0.810924 


7.78374 
—1.54251 
—0.339348 
—0.90089 


G2G), = | 
Thus, 


G1G| + G2G4 


|С1С + G2G;| 


so that 


0.205882 —0.794118 —1.79412 0.205882 —0.794118 
0.533613 0.533613 —0.466387 0.533613 1.53361 
—1.13866  —1.13866 0.861345 0.861345 —0.138655 |’ 
-0.810024 118908 0189076 0.189076 1.18908 | 
—1.54251 —0.339348 — —0.90089 
3.70846 —144393 1.60536 
—144393 411535 —0.157298 | 
1.60536 —0.157208 4.2145 
10.7569 —149617 —1.21985 —0.502348 
-149617 5.71067 —248586 3.60434 | |. 
—121985 —248586 8.91199 —1.51788 | ^ 
—0.502348 3.62434 —1.51788 6.37771 
1798.49, 
Sus 580 
ошса = 0.322493. 
(GG, + 6263) 1798.49 
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In this case, n, — r = 11 —2 = 9, m=2, p = 4, p—m= р = 2 and p» = 2. This 
situation is our special Case (2). Then, 


_1-Vw  1-0.5679 
Э= Mw «0.5679 


г 1 6 
EX олш ш оа 572 = Зуз = 2.2826, 
r 


= 0.7609, 


t2 


which is the observed value of an Р, зь —r—p+1) = F4,12 random variable under the null 
hypothesis. From an F-table, at the 5% level, F4,12,0.05 = 3.26 > 2.2826. In the case of 
the F-test, we reject the hypothesis for large observed values of the statistic. Accordingly, 
we do not reject H, that m = 2 or that the growth model is linear. Naturally, the null 
hypothesis would not be rejected either at a lower level of significance. 


We now tackle the second part of the problem wherein m = 3 and m — 1 = 2, that 
is, the growth model is assumed to be a polynomial of degree 2 under the null hypothesis. 


Again, we take the time intervals to Бет = І, fj = 1, в = 3 and t4 = 2, so that 


I. d 1 1/2 1/4 
FEST |_|1 1 1 
i= 1 f | | 1 3/2 9/4 

1м 1 2 4 


We can make use of some of the computations from the first part of the problem, that is, 
the calculations related to Т. Thus, we have 


92 156 128 40 
TS! =—| 63 69 157 185 
81.5 —145.5 198.5 492.5 


416 474 627 
T,S T; = — | 474 706 1178 |, |580 x 7/57! T)| = 3532200 
627 1178 2291.5 
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230115 —347565 115710 


Cof(T;S ! T») = | —347565 560135 —192850 
115710 —192850 69020 
580 230115 —347565 115710 
(S T) = == | —347565 560135 —192850 
115710 —192850 69020 
40 156 
lo-lv 1 hazte 1 
T, S X = So 185 , T, S Y = 580 69 , 
492.5 —145.5 


and 


А, = (ES my'rs'x 


i 1892250 0.535714 дот 
= 3539909 | —5256250 | = | —14881 | = ап |, 
2943500 0.833333 d» 
Аз = (T]5-1 D) TIS Y 
—4919850 —1.39286 до? 
= 3539909 | 12488850 | =| 3.53571 | = ài 
—5298300 =15 йәә 


If the hypothesis of a second degree growth model was not rejected, the model for men, 
denoted by №, (7), and that for women, denoted by / (7), would be the following: 


Men: f(t) = 0.535714 — 1.48817 + 0.833332, 
Women: f,(t) = —1.39286 + 3.53571 t — 1.5 г2. 


Employing notations paralleling those utilized in the first part, we have 


0 0 
; ; 1 [| -420500 —0.119048 
x /o-l —-lT/c-ly.. = 
АРЕ Tage Т 387 A — Sanaa | 630750 0.178571 |” 
3153750 0.892857 
| TE | | 
: 2270700 0.642857 
4з = 15$ T) T8 Y] = mum. 1892250 0.535714 
| -1135350 | | —0321429 | 
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as well as the following sample deviation matrices and sample sum of products matrices: 


С\ = Y — Т›А| 


—1 1 0 —1 1 0 
_ 1.11905 1.11905 1.11905 0.119048 1.11905 2.11905 | 
~ | —0.178571 —1.17857 -—1.17857 0.821429 0.821429 —0.178571 |, 
| —1.89286 —1.89286 0.107143 —0.892857 —0.892857 0.107143 | 


4. 1. —1. 0. 
9.51361 —2.19898 —4.9949 


1 
/ С * З 
GiGi=| i 519898 419133 0.95663 1: 
0. 4.9949 0.95663 8.78316 
G2 = Y Т»›А2 
E 1 0 | 1 0 
| 0.357143 0.357143 0.357143 —0.642857 0.357143 1.35714 
= | —0.535714  —1.53571 —1.53571 0.464286 0.464286 —0.535714 |’ 
—0.678571 —0.678571 1.32143 0.321429 0.321429 1.32143 | 
4. 1. —1. 0. 
poc 2.16531 —2.14796 1.68878 
202 = 


1. 
—1. —2.14796 5.72194 —1.03316 |’ 
0. 1.68878 —1.03316 4.6199 


8. 2. = 0. 
2. 12.2789  —4.34694 3.30612 
—2. —4.34694 9.91326 —0.0765339 | 8nd 
0. —3.30612 —0.0765339 13.4031 


IG1G' + G3G5| = 9462.7797. 


GiG; + G2G4 = 


Thus, for this part, the statistic w obtained from the A-criterion, is the following: 


|51 + S| _ 580 


w = Se = 0.0612928. 

IG1G, + G2G,| 9462.7797 
In this case, m = 3, р = 4, г = 2, р = 1, p = 3 and п -г-р+1 = 
11-2-4+1 = 6, 


1— – г – 1 6 
M oS Gone ec > (15.3151) = 45.945, 
w 


P 
and № 60.01 = 10.92. 
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When implementing F-tests, we reject the null hypothesis for large observed value of the 
statistic. Since 45.945 > Р 60.01, the hypothesis that a second degree polynomial fits is 
rejected at the 1% significance level and, of course, at any higher level. Let us consider 
the asymptotic results. Since n, = 11 is not large, the results ought to be interpreted with 
some caution. In the in of the linear шше (m = 2), rp, = 2(2) = 4 and the tabu- 
lated critical values are x2 40.05 = = 9.49 and x? 4001 = = 13.26. With this chisquare test, we 
reject H, for large observed value of the statistic. In the linear case, w = 0.3225 so that 
—21IndA = —2(n,/2) 1а w = —11 ln w = —11[—1.1317] = 12.45. Hence, according to the 
approximate asymptotic chisquare test, the hypothesis of a linear fit should be rejected at 
the 5% significance level but not at the 1% significance level. In the second degree polyno- 
mial growth curve case, p; = 1 so that rp; = 2, and the tabulated values are Хі = 5.99 
апа 35 adi = 9.49. Since —21n à = —11Inw = —111n0.0613 = —11[—2.79] = 30.71, 
according to the approximate asymptotic chisquare test, the hypothesis of a second degree 
polynomial fit ought to be rejected at both the 5 and 1% levels, which corroborates the 
conclusion obtained with the F test. In sum, there is some evidence of a linear fit but no 
indication whatsoever of a quadratic fit. 


14.6.5. A general structure 


Potthoff and Roy (1964) considered a general format whereby several hypotheses can 
be tested at the same time by making use of the structure. When applied to the model 
specified in (14.6.4), the Potthoff-Roy procedure yields the following format: 


E[X11; Xi, Ху] = [T41, TAG, TAQ T(AG AD = TA 
(14.6.22) 
where Л is a nı x 1 vector of unities. The left and right-hand sides of this equation are 
p Xn, matrices where T is p x m and the m x 1 vector A, is repeated nı times. The sample 
matrix of all r groups of nı + --- +n, =n, = n individuals, denoted by а boldfaced X, is 
the following: 


E[X] = [TA Jį, .... TA,J,] = Т[АД,..., A4,] = ТАД), А =[А1,..., Ar] 

(14.6.23) 
where Jo) is an m x (n14-: - --En,) = mxn matrix whose first n; columns all have unities in 
the first row (or J | ) and zeros as the other elements or in the first nı x n; diagonal block, the 
first row elements are all unities, in the next n2 x n2 diagonal block, the first row elements 
are all unities and the rest, all zeros, and so on, up to the last n, x n, diagonal block wherein 
the first row are all unities and the remaining elements are zeros, all non-diagonal blocks 
being null matrices. We can also pre-multiply X by a q x p constant matrix Q such that Q 
is of full rank q < p. Then OX;; ^ Ng(QTAj, ОЎО”), j=1,... n;i, i=1,...,r.In 
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this instance, we are making repeated measurements at q points f1, ..., tq, instead of the 
p points previously considered, and the model is the following: 
E[QX] = ОТА. Ду. (14.6.24) 


This model enables one to test all types of hypotheses of the form 
САУ = О (14.6.25) 


where C and V are given matrices, С is s x m, A is the m xr matrix of parameters and V is 
r x t for some г and t. For example, let C = Im, V beanr x (у — 1) matrix where the first 


r — | columns form an identity matrix and the last column is (—1, —1,..., —1)'. Then, 
A; — А, = О, Az — A, = O,..., А; — A, = O, or equivalently, Aj = --- = А,. 
For testing the equality of the models 7A; = T A» = --- = T A,, one need not apply the 


above procedure. One can directly employ the multivariate one-way layout analysis with 
r — | degrees freedom associated with the sum of squares and cross products matrix due 
to the general nature of the model. 


Exercises 14 


14.1. The following are observed random samples of sizes n; = 4 and пз = 5 from 
two independent real Gaussian populations, № (М, 27) апа № (М, X) sharing the same 
covariance matrix X > О and having the mean value vectors M; and M», respectively. 
Test the hypothesis at the 5% significance level that the mean values M and М» have (1): 
parallel profiles; (2): coincident profiles: 


1241 2-1 1 —2 0 
Sample 1: | -1 1 2 2 |, Sample2: | 1 1 4 #1 3 
13 1 3 4 23 01 


14.2. The following are observed samples of sizes 5 and 6 from two independent real 
Gaussian populations with the same covariance matrix. The columns of the matrices rep- 
resent the observations. Test the hypothesis at the 596 level of significance that the expected 
values or population mean value vectors have (1): parallel profiles; (2): coincident profiles: 


1-1 2 30 213-1 1] 0 
Sample 1: E м La , Sample 2: : : 2 2 2 м . 
1 


3 =) —3 0 —1 2-2 4 — 

1 2-12 145 3-1 0 
14.3. In Exercise 14.1, determine whether the profile is level (1): in the first population; 
(2): in the second population; (3): in both the populations. 
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14.4. Repeat Exercise 14.3 with the sample of observations provided in Exercise 14.2. 


14.5. Two breeds of 12 cows having identical characteristics are given a particular type 
of feed and their weights are monitored for four weeks. The observations are the weight 
readings on the date minus the initial weight. The weight readings are made at the end of 
every week for four successive weeks. The columns in A represent successive observations 
on 5 cows of breed-1 and the columns of B, successive observations on 7 cows of breed-2. 
Test the hypothesis at the 596 significance level that the breed effect is a polynomial growth 
curve of (1): degree 1, (2): degree 2 for each breed: 


01 01 3 1011030 
xc 12 124 {2113232 
1124234?’ “3 7 25 2 4 4|* 

[2621 6 42273 6 4 
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Chapter 15 R) 


Cluster Analysis and Correspondence Analysis vmm 


15.1. Introduction 


We will employ the same notations as in the previous chapters. Lower-case letters 
X, y, ... Will denote real scalar variables, whether mathematical or random. Capital letters 
X, Y,... will be used to denote real matrix-variate mathematical or random variables, 
whether square or rectangular matrices are involved. A tilde will be placed on top of letters 
such as X, y, X : Y to denote variables in the complex domain. Constant matrices will for 
instance be denoted by A, B, C. A tilde will not be used on constant matrices unless the 
point is to be stressed that the matrix is in the complex domain. The determinant of a 
square matrix A will be denoted by |A| or det(A) and, in the complex case, the absolute 
value or modulus of the determinant of A will be denoted as |det(A)|. When matrices 
are square, their order will be taken as p x p, unless specified otherwise. When A is a 
full rank matrix in the complex domain, then AA* is Hermitian positive definite where 
an asterisk designates the complex conjugate transpose of a matrix. Additionally, dX will 
indicate the wedge product of all the distinct differentials of the elements of the matrix X. 
Thus, letting the p x q matrix X = (x;;) where the x;;’s are distinct real scalar variables, 
dX — AL. ^a dxj;. For the complex matrix Х = Xı +iX2, i = J(—1), where X4 
and X» are real, dX = dX; A dX». 


15.1.1. Clusters 


A cluster means a group or a cloud of items close together with reference to one or 
more characteristics. For instance, in a countryside, there are villages which are clusters of 
houses. In a city, there are clusters of high-rise buildings or clusters of apartment blocks. If 
we have 2-dimensional data points marked on a sheet of paper, then there may be several 
places where the points are grouped together in large crowds, at other places the points 
may be bunched together in smaller clumps and somewhere else, there may be singleton 
points. In a classification problem, we have a number of preassigned populations and we 
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want to assign a point at hand to one of those populations. This cannot be achieved in the 
context of cluster analysis as we do not know beforehand how many clusters there are in 
the data at hand or which data point belongs to which cluster. Cluster analysis is akin to 
pattern recognition whereas classification is a sort of taxonomy. Suppose that a new plant 
is to be classified as belonging to one of the known species of plants; if it does not fall into 
any of the known species, then we have a member from a new species. In cluster analysis, 
we are, in a manner of speaking, going to create various ‘species’. To start with, we have 
only a cloud of items and we do not know how many categories or clusters there exist. 


Cluster analysis techniques are widely utilized in many fields such as psychiatry, so- 
ciology, anthropology, archeology, medicine, criminology, engineering and geology, to 
mention only a few areas. If real scalar variables are to be classified as belonging to a 
certain category, one way of achieving this is to ascertain their joint dispersion or joint 
variation as measured in terms of scale-free covariance or correlation. Those variables that 
are similarly correlated may be grouped together. 


We will consider the problem of cluster analysis involving n points X1, ..., X, where 
each X; is a real p-dimensional vector, that is, we have a p x n data matrix 


X11 X12 «++ Xin 

X2] X22 ... х 
X-[X,Xo...,Xd-| . . . "|. (15.1.1) 

Хр1 Xp2 +--+ Xpn 


15.1.2. Distance measures 


Two real p-vectors are close together if the “distance” between them is small. Many 
types of distance measures can be defined. Let X, and X, be two real p-vectors. These аге 
the r-th and s-th members or columns in the data matrix (15.1.1). Then, the following are 
some distance measures: 


Б 


dy (Xr, Xs) = рэ ur х": 


i-i 


for m = 2, we have the Euclidean distance do(X,, X,) = (yo |Xir — Xis 212, or, denoting 
d as d?, we have 
Р 
d*(X,, X) = У Gir — xis)’, (15.1.2) 
i=l 
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where the absolute value sign can be replaced by parentheses since we are dealing 
with real elements. We will utilize this convenient quantity d? for comparing observa- 
tion vectors. There may be joint variation or covariances among the coordinates in each 
of the vectors, in which case, Cov(X,) = £X > О. If all the Xj;’s, j = 1,...,n, 
have the same covariance matrix, then Cov(X;) = X, j = 1,...,n, and a statisti- 
cian might wish to consider the generalized distance between X, and Х,, or its square, 
do (Xr, Xs) = (X, — Xs)! Moy X;), the subscript g designating the generalized 
distance. Since 27 is unknown, we may wish to estimate it. However, if there are clusters, 
it may not be appropriate to make use of the entire data set of all п points, since the joint 
variation or the covariance within each cluster is likely to be different. And as we do not 
know beforehand whether clusters are present, securing a proper estimate of 27 turns out 
to prove problematic. As a result, this problem is usually circumvented by resorting to the 
ordinary Euclidean distance instead of the generalized distance. 


Let us examine the effect of scaling a vector. If the unit of measurement in one vector is 
changed, what will be the effect on the squared distance? Consider the following vectors: 


—1 —3 
Xi=| O| and X22| 2| => d?(Xi, X2) 2 (X1 — Хә)'(Х|— X2) 
=2 4 


= [(—1) - (—3)]7 + [(0) – (D + [(-2) — (P = 44. 


The squared distances between the vectors when (1) X; is multiplied by 2; (2) X» is 
multiplied by 2; (3) X; and X» are each multiplied by 2, are 


ДОТ X2) = 03) + (0.— 2)? + (—4 — 4)? = 69 
4?°(Х \,2Хә») = (—1 +6? + (0 — 4)? + (—2 — 8)? = 141 
d?(2X1,2X>) = A[(X4 — X3) (X4 — X2)] = 4 x 44 = 176. 


Note that they are fully distorted as 69 4 4(44) and 141 4 4(44). Thus, the scaling of 
individual vectors can fully alter the nature of the clusters when there are clusters in the 
original data. As well, members of the original clusters need not be members of the same 
clusters in the scaled data and the number of clusters may also change. Accordingly, it is 
indeed inadvisable to make use of the generalized distance. Nor is re-scaling the individual 
vectors a good idea if we are seeking clusters. Accordingly, the recommended procedure 
consists of utilizing the original data without modifying them. It may also happen that the 
components in each p-vector are recorded in different units of measurement. Then, how 
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to eliminate the location and scale effect on the components in each vector? This can be 
achieved by standardizing them individually, that is, by subtracting the average value of 
the components from the components of each vector and dividing the result by the sample 
standard deviation. Let us see what happens in the case of our numerical example. Letting 
хт and хә be the averages of the components in X, and Хэ, and s? and 52 be the associated 
sums of products, we have 


1 1 
ху = 310-0 + (0) + (-2)] = –1, № = 31673) + (2) + (4)] = 1, 


р 
st = У (хи – 4)? = [(—1) – CDP +10 — CDP +162) – C DP = 2, 


i-l 


р 
52 = У Gia — 0)? = 26. 


i-l 


Thus, the standardized vectors X, and X2, denoted by Y; and У, are the following: 


азо das 
pias es et eon "qe ende pes s | др, 
= 0 Exe е 


апа а? (Yı, Y2) = (ү — №) (Yı — Y2) = 7.6641. However, Ү and Y? are very distorted and 
the distance between X, and X» is also modified. Hence, such procedures will change the 
clustering aspect as well, with new clusters possibly differing from the original clusters. 


Let us consider the matrix of squared distances, denoted by D: 


о ав... ар 


In 
D= aa : x Dn =D. (15.1.3) 
Pd... a 
For example, letting 
1 2 0 3 
Хү = 0|, X: =]|1|, Хз = | 1| and Х1= |1 |, 
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we have dj, = (1 2)? + (0 – 1)? + (-1 – 3)? = 18, dj, = 11, dj, = 14, dj, = 
5, d2, = 2, d$, = 9, so that 


0. 18 11 14 
l8 0 5 2 
ар e. Ө 
142 9 0 


The question of interest is the following: Given a set of n vectors of order p, how can one 
determine the number of clusters and then, classify them into these clusters? 


15.2. Different Methods of Clustering 


The main methods are hierarchical in nature, the other ones being non-hierarchical. 
We will begin with non-hierarchical techniques. In this category, the most popular one 
involves optimization or partitioning. 


15.2.1. Optimization or partitioning 


With this approach, we have to come up with two numbers: k, a probable number of 
clusters, and r, the maximum separation between the members of each prospective cluster. 
Based on the distances or on the dissimilarity matrix, D, one should be able to determine 
the likely number of clusters, that is, k. Then, one has to find a set of k vectors among the 
n given vectors, which will be taken as seed members or starting members within the k 
potential clusters. Several methods have been proposed for determining this k, including 
the following: 


1. Examine the closeness of the original vectors as indicated by the dissimilarity matrix 
D and, to start with, decide on an initial numbers for k and the likely distance between 
members within a cluster denoted by r. 


2. Examine the original data points or original p-vectors and, based on the comparative 
magnitudes of the components of the observed p-vectors, ascertain whether there is any 
grouping possible and predict a value for each of k and r. 


3. Evaluate the sample sum of products matrix S from the original data matrix. Compute 
the two main principal components associated with this S. Substitute Xj, the j-th obser- 
vation vector, in the two principal components. This provides a pair of numbers or one 
point in a two-dimensional space. Compute n such points for j = 1,..., и. Plot these 
points. From the graph, assess the clustering pattern, the number k of possible clusters, 
estimates for r, the maximum distance between two members within a cluster as well as 
the minimum distance between the clusters. 
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4. Choose any number k, select k vectors at random from the set of n vectors; then, 
preselect a number r and use it as a measure of maximum separation between vectors. 


5. Take any number k and select as seed vectors the first k vectors whose separation is at 
least two units among the set of п vectors. 


6. Look at the farthest points. Select k of them that are separated by at least r units for 
preselected values of k and r. 


If the dissimilarity matrix D is utilized, then the separation number r must be measured 
in dj, units, whereas r should be in dj; units if the actual distances dj; are used. After the 
seed vectors are selected, the remaining n — k points are to be associated to these seed 
points to form clusters. Assign the vectors closest to each of the seed vectors and form the 
initial k clusters of two or more vectors. For example, if there are three closest members 
at equal distance to a seed vector then, that cluster comprises 4 members, including the 
seed vector. Then, compute the centroids of all initial clusters. The centroid of a cluster is 
the simple average of the vectors included in that cluster. Thus, the centroid is a p-vector. 
Then, measure the distances of all the points belonging to the same cluster from each 
centroid, and incorporate all points within the distance of r from a centroid to that cluster. 
This process will create the second stage of k clusters. Now, evaluate the centroid of each 
of these k clusters. Again, repeat the process of computing the distances of all points from 
each centroid. If a member in a cluster is found to be closer to the centroid of another 
cluster than to its own cluster’s centroid, then redirect that vector to the cluster to which 
it belongs. Rearrange all vectors in such a manner, assigning each one to a cluster whose 
centroid is the closest. Note that the number k can increase or decrease in the course of 
this process. Continue the procedure until no more improvement is possible. At this stage, 
that final k is the number of clusters in the data and the final members in each cluster are 
set. This procedure is also called k-means approach. 


This k-means approach has a serious shortcoming: if one starts with a different set of 
seed vectors, then it is possible to end up with a different set of final clusters. On the other 
hand, this method has the appreciable advantage that it allows a member provisionally 
assigned to a cluster to be moved to another cluster where it really belongs, that is, it 
allows the transfer of points. The following example should clarify the procedure. 


Example 15.2.1. Теп volunteers are given an exercise routine in an experiment that mon- 
itors systolic pressure, diastolic pressure and heart beat. These are measured after adher- 
ing to the exercise routine for four weeks. The data entries are systolic pressure minus 
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120 (SP), diastolic pressure minus 80 (DP) and heart beat minus 60 (HB), where 120, 80 
and 60 are taken as the standard readings of systolic pressure, diastolic pressure and heart 
beat, respectively. Carry out a cluster analysis of the data. The data matrix is the following 
where (1), ..., (10) represent the data vectors А, ..., Ajo for the 10 volunteers, the first 
row represents SP, the second row, DP, and the third, HB: 


^ (0 Q GB) uU GC) (6 TM (8 (9) (10) 
SP: 0 1 1 2 3 4 6 8 5 10 
DP: 1 0 -1 3 2 3 S8 10 6 8 
HB: -1 -1 -1 -2 5 2 7 8 9 4 


Solution 15.2.1. Let us compute the dissimilarity matrix D: 


^0) 0 G) (0 5 (6) О) @ (9) (10) 
(1) 0 2 5 9 46 29 149 226150 174 
(2) 2 0 1 11 44 27 153 230152 170 
(3) 5 1 О 18 49 34 170 251165 187 
(4) 9 11 18 О 51 20 122 185139 125 
Р=| (5) 46 44 49 51 0 11 49 98 36 86 
(б) 29 27 34 20 11 0 54 10159 65 
(7) 149 153 170 122 49 54 0 9 9 25 
(8) 226 230 251 185 98 101 9 0 26 24 
(9) 150 152 165 139 36 59 9 26 0 54 
(10) 174 170 187 125 86 65 25 24 54 0 


The data matrix suggests the possibility of three clusters. Accordingly, we may begin with 
the vectors A», As and Ag as seed vectors and take the separation width as r — 15 units. 
From D, we find 4» = |, the smallest number, and hence А» апа Аз form the cluster: 
{ Аэ, Аз). Note that de. = 11, $0 that Ag and As form the cluster: { А5, Ag). Since dz; = 9; 
A7 and Ag form a cluster: (A7, Аз}. Now, consider the centroids. Letting C11, C21 and 
C3, denote the centroids, C1; = 3(42 + Аз), C21 = 5(A5 + Аб) and C3; = 1(A7 + Ag), 
that 1s, 


1 7/2 7 
Cy = | —1/2 |, Cop = | 5/2 | and C31 = 9 
—1 7/2 15/2 
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Let us calculate the distances of A1,..., Ajo from C11, C21, C31: 
а?(С\\, A) = E а (С, A2) = а а (Сл, Аз) = 7 а?(С\\, A4) = a 
4?(С\\, As) = a d*(Cii, Ав) = d'(Cii, A7) = E d(C, Ag) = a 
а (Cy, А) = а (Сл, A10) = s. а (Сд, A1) = E Pe. Aye, 
d? (C21, Аз) = E d*(C51, A4) = zz d?(C5, As) = г а (C21, Ав) = Е 
а (C1, Аз) = с d? (C2, Ag) = d? (C21, А) = 2 а (Сд, A10) = 2 
d*(C31, A1) = d^(C31, Аз) = 2. d*(C31, Аз) = E d^(C31, A4) = * 


285 301 9 
4° (Сз, As) = >, T (Csi, Ag) = 1, d (Ci, Аз) = 2, (Cni, Ав) = 


s 


d*(C31, Ao) = = d*(C31, A10) = =. 
4 4 

We include all the points located within 15 units of distance to the nearest cluster. Then, the 
second set of clusters are the following: Cluster 1: (A1, A2, Аз, A4}, cluster 2: (As, Ав}, 
cluster 3: (A7, Ag}. Note that Ao is quite close to Cluster 3. We may either include it in 
Cluster 3 or treat it as a singleton. Since the next stage calculations do not change the 
composition of the clusters, we may take the final clusters as (A1, A2, A3, A4}, (As, Ao}, 
{A7, Ag, Ao) and {Ajo} where Cluster 4 consists of a single element. This completes the 
computations. 


Let us examine the principal components of the sample sum of products matrix and 
plot the points to see whether any cluster can be detected. The sample matrix denoted by 
X and the sample average, denoted by X, are the following: 


Oo 1 1 2 34 6 8 5 10 4 
X=| 1 0-1 3 23 8 10 б 8|, X=|4 
Sal cL -1 -2 5 2 7 8 9 4 3 


Let the matrix of sample averages be X= [X ,X,.. 
X4 — X — X. Then, 


З X ] and the deviation matrix be 


wd Waco ip 30085 д Din x8 
X22] 58^ эй 25:5- 45] «9. mq od ee D 
e ud хай шй Or cup a 6-7] 
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and the sample sum of products matrix is 5 = X4X,, that 1s, 


96 101 88 
S= |101 128 112 
88 112 156 


The eigenvalues of S are A; = 330.440, Az = 40.522 and A4 = 9.039. An eigenvec- 
tor corresponding to Aj = 330.440 and an eigenvector corresponding to Az = 40.522, 
respectively denoted by U; and U^» are the following: 


0.782 —0.676 
Оу = | 0.943 | and 0 = | —0.500 
1.000 —1.000 
Then the first two principal components are ОУ and (У with Y' = [yj, y2, уз]. We 
substitute our sample points A1, ..., A10 to obtain 10 pairs of numbers. For example, 
0 0 
U; Aı = [0.782, 0.943, 1] 1 | = —0.057, UŻA; = [—0.676, —0.5, 1] 1 | =-1.5 
—1 —1 


and hence the first pair of numbers or the first point is Pj : (—0.057, —1.500). 
Similar calculations yield the remaining 9 points as: Р : (—0.218, —1.676), Р; : 
(—1.161, —1.176), P4 : (2.393, —4.852), Р; : (9.232, 1.972), Ps : (7.957, —2.204), P5: 
(19.236, —1.056), Pg : (23.686, —2.408), Po : (18.568, 2.620), Pio : (19.364, —6.760). 
It is seen that these points which are plotted in Fig. 15.2.2 form the same clusters as 
the original points shown in Fig. 15.2.1, that is, Cluster 1: (A1, A2, Аз, Ад}; Cluster 2: 
(As, Ao}; Cluster 3: (A7, Ag, Ag}; Cluster 4: {Ajo}. 


Other non-hierarchical methods are currently in use. We will mention these procedures 
later, after discussing the main hierarchical technique known as single linkage or nearest 
neighbor method. 


15.3. Hierarchical Methods of Clustering 


Hierarchical procedures are of two categories. In one of them, we begin with all the n 
data points as n different clusters of one element each. Then, by applying certain rules, we 
start combining these single-member clusters into larger clusters, the process being halted 
when a desired number of clusters are obtained. If the process is continued, we ultimately 
end up with a single cluster containing all of the n points. In the second category, we 
initially consider one cluster that comprises the n elements. We then start splitting this 
cluster into two clusters by making use of some criteria. Next, one or both of these sub- 
clusters are divided again by applying the same criteria. If the process is continued, we 
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Figure 15.2.1 The original 10 data points 
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Figure 15.2.2 Second versus first principal component evaluated at the A;'s 


finally end up with n clusters of one element each. The process is halted when a desired 
number of clusters are obtained. In all these procedures, one cannot objectively determine 
when to stop the process or how many distinct clusters are present. We have to specify 
some stopping rules as a means of selecting a suitable number of clusters. 


15.3.1. Single linkage or nearest neighbor method 


In this single linkage procedure, we begin by assuming that there are n clusters con- 
sisting of one item each. We then combine these clusters by applying a minimum distance 
rule. At the initial stage, we have only one element in each ‘cluster’, but at the following 
steps, each cluster will potentially contain several items and hence, the rule is stated for 
general clusters. Consider two clusters A and B whose elements are denoted by X ; and 
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Y;, that is, X; є A and Y; є B, the X;’s and Y;'s being p-vectors belonging to the data 
set at hand. In the minimum distance rule, we define the distance between two clusters, 
denoted by d(A, B), as follows: 


d(A, В) = min(d(X;, Y;), for all X; € A, Y; є B). (15.3.1) 


This distance is measured in the units of the definition of the distance being utilized. We 
will illustrate the single linkage hierarchical procedure by making use of the data set pro- 
vided in Example 15.2.1 and its associated dissimilarity matrix D. We will utilize the dis- 
similarity matrix D to represent various "distances". Since the matrix D will be repeatedly 
referred to at every stage, it is duplicated next for ready reference: 


уэ (0) 0 O @ (5) © (7) (8 (9) (10) 
(1) 0 2 5 9 46 29 149 226150 174 
(2) 2 0 1 11 44 27 153 230152 170 
(3) 5 1 О 18 49 34 170 251165 187 
(4) 9 11 18 О 51 20 122 185139 125 
Р=| (5) 46 44 49 51 O 11 49 98 36 86 
(0 29 27 34 20 11. 0 54 101 59 65 
(7) 149 153 170 122 49 54 0 9 9 25 
(8) 226 230 251 185 98 101 9 0 26 24 
(9) 150 152 165 139 36 59 9 26 0 54 
(10) 174 170 187 125 86 65 25 2454 0 


To start with, we have 10 clusters {A ;}, j = 1,..., 10. At the initial stage, each cluster has 
one element. Then d(A, B) as defined in (15.3.1) is the smallest distance (dissimilarity) 
appearing in D, that is, 1 which occurs between the elements corresponding to A» and 
Аз. These two clusters of one vector each are combined and replaced by B, by taking 
the smaller entries in each column of the combined representation of the dissimilarity 
measures corresponding A» and Аз. For illustration, we now list the dissimilarity measures 
corresponding to the original A» and Аз and the new B, as rows: 


A>: Q) [0 1] (11) (44) (27) (153) (230) (152) (170) 
A3: 5 [1 O] 18 49 34 170 251 165 187 
Bi:2 [0] 11 44 27 153 230 152 170 


The rows representing A» and Аз are combined and replaced by B, as shown above. The 
second and third columns in D are combined into one column, namely, the B; column. 
The elements in Ві are the smaller elements in each column of Az and Аз. The bracketed 
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elements in A» and Аз, namely [0, 1] and [1, 0], are combined into one element [0] in 
B, the updated dissimilarity matrix having one fewer row and one fewer column. These 
are the intersections of the two rows and columns. Other smaller elements in the two 
original columns, which make up В|, are displayed in parentheses. This process will be 
repeated at each stage. At the first stage of the procedure, we end up with 9 clusters: 
C; = (A2, A3}, {Aj}, J = 1,4,..., 10, the resulting configuration of the dissimilarity 


matrix, denoted by Dj, being 


Ag 
Ao 
A10 


Now, the next smallest dissimilarity is 2 which occurs at (A1, Ву). Thus, the rows 
(columns) corresponding to A, and B, are combined into one row (column) B2. The 
original rows corresponding to A; and B, and the new row corresponding to В» are the 


following: 
Ai: [0 2] (9) 46 29 (149) (226) (150) 174 
Bi: [2 0] (44) (27) 153 230 152 (170) 
Bz: [0] 44 27 149 226 150 170. 
The new configuration, denoted by D», is the following: 
{ә B2 A4 А5 А Az As Ao Ai 
Bo 44 27 149 226 150 170 
Ад 51 20 122 185 139 125 
As 44 51 0 11 49 98 36 86 
Гэ = | A6 27 20 1 0 54 101 59 65], 
Ат 149 122 49 54 0 25 
Ag 226 185 98 101 9 26 24 
Ag 150 139 36 59 9 26 54 
Ajo 170 125 86 65 25 24 54 


the resulting clusters being C? = (A1, A2, Аз), {Aj}, J = 4,..., 10. The next smallest 
dissimilarity is 9, which occurs at (B2, A4). Hence these are combined, that is, the first 


29 
149 
226 
150 
174 


В\ 


27 
153 
230 
152 
170 
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A4 
9 
11 
0 
51 
20 
122 
185 
139 
125 


As 
46 
44 
51 
0 
11 
49 
98 
36 
86 


A6 
29 
27 
20 
11 
0 
54 
101 
59 
65 


Ат 
149 
153 
122 
49 
54 
0 
9 
9 
25 


Ag 
226 
230 
185 
98 
101 
9 
0 
26 
24 


A9 
150 
152 
139 
36 
59 


54 


A10 
174 
170 
125 
86 
65 
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two columns (rows) are merged as explained. The combined row, denoted by B3, is the 
following, its transpose becoming the first column: 


Вз = [0, 44, 20, 122, 185, 139, 125], 
and the new configuration is the following: 


(— Вз As Ag Az Ag Ao Ato 
B4 O 44 20 122 185 139 125 
As 44 O 11 49 98 36 86 
Ag 20 11 0 54 101 59 65 
A; 122 49 54 0 9 9 25 
Ag 185 98 101 9 0 26 24 
Ag 139 36 59 9 26 0 54 
Ajo 125 86 65 25 24 54 0 


At this stage, the clusters are C; = (A1, A2, Аз, А4}, {Aj}, J = 5,..., 10. The next 
smallest number is 9, which is occurring at (A7, Ag), (A7, Ao). Accordingly, we combine 
Ат, Ag and Ao, and the resulting configuration is the following where the resultant of the 
replacement rows (columns) is denoted by B4: 


у ә Вз As Ao B4 Ato 

B3 О 44 20 122 125 
As 44 O 11 36 86 

Ag 20 11 0 54 65’ 
B4 122 36 54 0 24 
Ajo 125 86 65 24 0 


the clusters being C2 = (Ai, A2, Аз, Ag}, Сз = (As, Ag, Ao}, {Ai}, i = 5, 6, 10. The 
next smallest dissimilarity measure is 11 at (As, Ав). Combining these, the replacement 
row is Bs = [20, 0, 36, 65], and the new configuration, denoted by Ds is as follows: 


фә B3 Bs B4 Ао 
Вз О 20 122 125 
Р5 = | Bs 20 0 36 65], 
B4, 122 36 0 24 
Aj 125 65 24 0 


the resulting clusters being C2 = (A1, A», Аз, Aa}, Сз = (A5, Аз, Ao}, Са = (As, Ao}, 
C5 = {Ajo}. 

We may stop at this stage since the clusters obtained from the other methods coincide with 
C2, C3, C4, Cs. At the following step of the procedure, C4 would combine with C3, with 
the next final stage resulting in a single cluster that would encompass all 10 points. 


D3 = 
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15.3.2. Average linking as a modified distance measure 


An alternative distance measure involving all the items in pairs of clusters is considered 
in this subsection. As one proceeds from any stage to the next one in a hierarchical proce- 
dure, a decision is based on the next smallest distance between two clusters. At the initial 
stage, this does not pose any problem since the dissimilarity matrix D is available and each 
cluster contains only a single element. However, further on in the process, as there are sev- 
eral elements in the clusters, a more suitable definition of "distance" is required in order to 
proceed to the next stage. Several types of methods have been proposed in the literature. 
One such procedure is the average linkage method under which the distance between two 
clusters A and B, denoted again by d(A, B), is defined as follows: 


n» nj 
УУ а(х, Yj) for all X; e A, Yj e B (15.3.2) 
j=li=l 


1 
d(A, B) = —— 
п\п2 
where the X;'s and Y;'s are all p-vectors from the given set of data points. In this case, the 


rule being applied is that two clusters having the smallest distance, as measured in terms 
of (15.3.2), are combined before initiating the next stage. 


15.3.3. The centroid method 


In a hierarchical single linkage procedure, another way of determining the distance 
between two clusters before proceeding to the next stage 1s referred to as the centroid 
method under which the Euclidean distance between the centroids of clusters A and B is 
defined as follows: 


= 2 _ i _ 1 22 
d(A, B) = а(Х, Ү ith X = — X; and Y 2 — Y;, 15.3.3 
(A, B) 2 d(X, Y) wi ; 2, 9 22» ( ) 


where X is the centroid of the cluster A and Y is the centroid of the cluster B, X; € 
A, i — l,...,nj, Yj € B, j = 1,..., n2. In this case, the process involves combining 
two clusters with the smallest d(A, B) as specified in (15.3.3) into a single cluster. After 
combining them, or equivalently, after taking the union of A and B, the centroid of the 
combined cluster, denoted by Z,is 


where the Z ;'s are the original vectors that were included in A or В. 
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15.3.4. The median method 


A main shortcoming of the centroid method of joining two clusters is that if nı is 
very large compared to п», then Z is likely to be closer of X, and vice versa. In order to 
avoid this type of imbalance, a method based on the median is suggested, under which the 
median of the combined clusters A and B is defined as 


1 - = 
Median gus = ;V +Y) with X; € A and Y; є B, (15.3.4) 
for alli, j andr. In this process, the clusters A and В for which Median 4uz is the smallest 
are combined to form the next cluster whose elements аге the Z,'s, Z, є AU B. 
15.3.5. The residual sum of products method 


From the one-way MANOVA layout, a residual or within group (within cluster) sum 
of products for clusters A, B and AU B, denoted by Ад, Rg and Кдув, are the following: 


n| n2 
XXi- XX- X), Rs = У (У; – Y (v; – Y) 


і=1 ј=1 


КА 


ny+n2 
у, (Z,— 7) (2. – 7), ZjeAUB, 2 = 


г=1 


niX +noY 
nı T n» | 


RAUB 


Once those sums of squares have been evaluated, we compute the quantity 
Taug = Raup — (Ra + Кв), (15.3.5) 


which can be interpreted as the increase in residual sum of products due to the process of 
merging the clusters A and B. Then, the procedure consists of combining those clusters A 
and B for which TAug as defined in (15.3.5) is the minimum. This method is also called 
Ward's method. 

There exist other methods for combining clusters such as the flexible beta method, and 
several comparative studies point out the merits and drawbacks of the various methods. 

In the hierarchical procedures considered in Sect. 15.3, we begin with the п data points 
as n distinct clusters of one element each. Then, by applying certain “minimum distance" 
methods, “distance” being defined in different ways, we combined the clusters one by one. 
We may also consider a hierarchical procedure wherein the n data points are treated as one 
cluster of n elements. At this stage, by making use of some rules, we break up this cluster 
into two clusters. Then, one of these or both are split again as two clusters by applying 
the same rule. We continue the process and stop it when it is determined that there is a 
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sufficient number of clusters. If the process is not halted at a certain stage, we will end 
up with a single cluster containing all of the n elements or points. We will not elaborate 
further on such procedures. 


15.3.6. Other criteria for partitioning or optimization 


In Sect. 15.2, we considered a non-hierarchical procedure known as the k-means 
method, which is the most popular in this area. After discussing this, we described the most 
widely utilized non-hierarchical procedure in Sect. 15.3. We will now examine other non- 
hierarchical procedures in common use. Some of these are connected with the MANOVA 
or multivariate analysis of variation of a one-way classification. In a multivariate one-way 
layout, let X;; be the j-th vector in the i-th group or i-th cluster, all vectors being p- 
vectors or p x 1 real vectors. Let there be k groups (k clusters) of sizes n1,..., ng with 
ni + п +: ny =n, = n, that is, the cluster sizes are n1, ..., ng, respectively. Let 
the residual sum of products or sum of squares and cross products matrix be denoted by 
U, which is p x p. This matrix U is also called within group or within cluster variation 
matrix. Let the between groups or between clusters variation matrix be V. In this setup, U 
and V are the following: 


k nj nj 
_ E = 1 
U = Уо» (К — X;)(Xij - Xi), Xi = — УХ, (15.3.6) 
і=1 j=l "P 
: 1 
V=> (Х;— Х)(Х;— X) = (Xi = KR, = XY, X=— X Ху. (533 
24 X ) ү X ) 2» p (3) 


Then, under the hypothesis that the group effects or cluster effects are the same, and under 
the normality assumption on the X;;'s, U and V are independently distributed Wishart 
matrices with n, — k and k — 1 degrees of freedom, respectively, where X > О is the 
parameter matrix in the Wishart densities as well as the common covariance matrix of 
the X;;'s, referring to Chap. 5. Thus, W; = (U + V)73U(U + у)—2 is a real matrix- 
variate type-1 beta random variable with the parameters (^5 E E, W = U -1 VU? 
is a real matrix-variate type-2 beta random variable with the parameters (E, E к) and 
W3 = U + V follows a real Wishart distribution having n, — 1 degrees of freedom and 
parameter matrix X > О, again referring Chap. 5. Observe that both U and V are real 
positive definite matrices, so that all of their eigenvalues are positive. The likelihood ratio 
criterion А. for testing the hypothesis that the group effects are the same is the following: 
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2 10| 1 1 


IU + V| IL-U-2VU-3| M € Wol 


We are aiming to have the within cluster variation small and the between cluster variation 
large, which means, in some sense, that U will be small and V will be large, in which case 
А. as given in (15.3.8) will be small. This also means that the trace of U must be small and 
trace of W2 must be large. Accordingly, a few criteria for merging clusters are based on 
tr(U), |U| and tr(W2). The following are some commonly utilized criteria for combining 
clusters: 


(1) Minimizing tr(U); 
(2) Minimizing |U |; 
(3) Maximizing tr(W2). 


These criteria are applied as follows: One of the n observation vectors is moved to a 
selected cluster if tr(U) is a minimum (|U | is a minimum and tr( W2) is a maximum for the 
other criteria). Then, tr(U) is evaluated after moving the observation vectors one by one 
to the selected cluster and, each time, tr(U) is noted; the vector for which tr(U) attains 
a minimum value belongs to the selected cluster, that is, it is combined with the selected 
cluster. Observe that 


tr(U) — (SX — Xi) (Kij — Xy) 
ij 
nk 
=( Dey = ХХ) — X) (00 -oXy — Xo!) 
j=l 
nk 
"3 Xi - X1)'(X1j - Ki) ++% (Ху — ŠO (Xy – Xp. (15.3.9) 
z = 


owing to the property that, for two matrices P and Q, tr(P Q) = tr(QP) as long as 
P Q and QP are defined. As well, observe that since (X;; — — X;) (Xi p= X;) is a scalar 
quantity for every i and j, it is equal to its trace. How does this criterion work in practice? 
Consider moving a member from the s-th cluster to the selected cluster, namely, the r-th 
cluster. The original centroids are X, and X,, and when one element is added to the r- 
th cluster from the s-th cluster, both centroids will respectively change to, say, pem 1 and 
X. Compute the updated sums of squares in the new r-th and s-th clusters. Then, add 
up all the sums of squares in all the clusters and obtain a new tr(U). Carry out this process 
for every member in every other cluster and compute tr(U) each time. Take the smallest 
value of tr(U) thus calculated, including the original value of tr(U), before considering 
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transferring any point. That vector for which tr(U) is minimum really belongs to the r-th 
cluster and so, is included in it. Repeat the process until no more improvement can be 
made, at which point no more transfer of points is necessary. 


Simplification of the computations of tr(U) 


As will be explained, computing tr(U) can be simplified. Consider the new sum 
of squares in the r-th cluster. Let the new and old sums of squares be denoted by 
(New),, (New),, and (Old),, (Old),, respectively. Let the vector transferred from the s-th 
cluster to the r-th cluster be denoted by Y. Then, 


(New), = У(Х, — X. (Ху — Xii)  Q — Xy — Х,а) 


j-l 


І 
=» Q6; - X, + Ot = Хә) 0, + Rr Kra) 
j=l 
ue ХО E X.) 


й 
=» (OG; -XV Xy = X) + (Ke — Xi Ot — Хы) 
j=l 


Р X41) (У mi X41) 
= (Old), + r(X, = X41) (X, m Ху) TUE Хн) (У mi Xa 


The difference between the new sum of squares and the old one is 
ài = r(X, — X, (X, — X.) + Q — Х, 1) (У — X44). 


Noting that 


= X 1 2 x r = 
Xr — Хы = X,— а c pec 


+1 


$; simplifies to 


r = E 
ôi = ———(У— gg — Х,). 
1 XT ) ( ) 
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A similar procedure can be used for the s-th cluster. In that case, the new sum of 
squares can be written as 


5—1 
(New), = Y (Xs — X. OG; — Xs-1) 
j=l 


= (Ху — X205; — Xx) - Y — XQ — Xa). 
j=1 


Then, proceeding as in the case of the r-th cluster and denoting the difference between the 
new and the old sums of squares as 62, we have 


з= (И XQ - X), з > 1, 
5 — 


so that the sum of the differences between the new and old sums of squares, denoted by б, 
is the following: 


Stee =X) Y= == axa) pea 
r+1 s— 1 


for s > 1, where X, and X, are ће original centroids of the r-th and s-th clusters, respec- 
tively. As such, computing ô is very simple. Evaluate the quantity specified in (15.3.10) 
for all the points outside the r-th cluster and look for the minimum of ô, including the 
original value of ô = 0. If the minimum occurs at a point Yı outside of the r-th cluster, 
then transfer that point to the r-th cluster. Continue the process for every vector in the s-th 
cluster and then, for r = 1, ..., k, assuming there are k clusters, until 6 = 0. In the end, 
all the clusters are stabilized, and k may take on another value. 

Among the three statistics tr(U), |U| and tr(W2), tr(U) is the easiest to compute, as 
was just explained. However, if we consider a non-singular transformation, other than an 
orthonormal transformation, then |U | and (И) are invariant, but tr(U) is not. 

We have discussed one hierarchical methodology of single linkage nearest neighbor 
method and one non-hierarchical procedure consisting of the k-means method. These seem 
to be the most widely utilized. We also mentioned other hierarchical and non-hierarchical 
methods without going into the details. All these procedures are not well-defined mathe- 
matical procedures. None of the procedures can uniquely determine the clusters if there are 
some clusters in the multivariate data at hand, and none of the methods can uniquely de- 
termine the number of clusters. The advantages and shortcomings of the various methods 
will not be discussed so as not to confound the reader. 
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Exercises 15 


15.1. For the p x 1 vectors X,,..., Xn, let the dissimilarity measures be (1) d = 
Dyer [ix — хук, Q) df? = Ya Gu — xj Х| = Dai xor -... хь]. Compute the 
matrices (1) d (2) (455), for the following vectors: 
—1 1 2 
Xi=]|-—-1|, X =| 1|,Х»=| 2), X= 
2 2 —1 — 
15.2. Nine test runs T — 1,..., T — 9 are done to test the breaking strengths of three 


alloys. The following data are the deviations from the respective expected strengths: 
4> T-1 T-2 T-3 T-4 T-5 T-6 T-7 T-8 T-9 


Alloy-1 0 el 1 2 —1 2 5 4 5 
Alloy-2 1 1 1 1 3 4 7 —4 8 
Alloy-3 —1 0 1 2 2 3 8 4 —7 


Carry out a cluster analysis by applying the following methods: (1) The single linkage 
or nearest neighbor method; (2) The average linkage method; (3) The centroid method; 
(4) The residual sum of products method. 


15.3. Using the data provided in Exercise 15.2, carry out a cluster analysis by utilizing the 
following methods: (1) Partitioning or optimization; (2) Minimization of tr(U); (3) Mini- 
mization of |U |; (4) Maximization of tr(W2) where U and W are given in Sect. 15.3.6. 


15.4. Compare the results from the different methods in (1) Exercise 15.2; (2) Exer- 
cise 15.3, and make your observations. 


15.5. Compare the results from the different methods in Exercises 15.2 and 15.3, and 
comment on the similarities and differences. 


15.4. Correspondence Analysis 


If the data at hand are classified according to two attributes, these characteristics may 
be of the same type, that is, both quantitative or both qualitative, or of different types, and 
whatever the types may be, we may construct a two-way contingency table. In a contin- 
gency table, the entries in the cells are frequencies or the number of times various com- 
binations of the attributes appear. Correspondence Analysis is a process of identifying, 
quantifying, separating and plotting associations among the characteristics and relation- 
ships among the various levels. In a two-way contingency table, we identify, separate and 
plot associations between the two characteristics and attempt to identify relationships be- 
tween row and column labels. 
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15.4.1. Two-way contingency table 


Consider the following example. A random sample of 100 persons from a certain town- 
ship are classified according to their educational level and their liberal disposition. In the 
frequency Table 15.4.1, the A ;’s represent their dispositions and the В; °ѕ, their educational 
levels, with A; = tolerant, A» = indifferent, Аз = intolerant, Bj = primary school ed- 
ucation level, B2 = high school education level, Вз = bachelor’s degree education level, 
B4 = master’s and higher degree education level. 


Table 15.4.1: A two-way contingency table 


4 > By В» В» ВА Total 
A1 6 14 16 4 40 
Аз 17 5 8 10 40 
Аз 7 6 6 1 20 
Total 30 25 30 15 100 


There are 6 persons having a tolerant disposition and primary school level of education. 
There is one person with an intolerant disposition and a master’s degree or a higher level 
of education, and so on. The marginal sums are also provided in the table. For example, the 
total number of persons having a primary school level of education is 30, the total number 
of persons having an intolerant disposition is 20, and so on. The corresponding relative 
frequencies (a given frequency divided by 100, the total frequency) are as follows (Table 
15.4.2): 


Table 15.4.2: Relative frequencies /;; in the two-way contingency table 


4 > Bı B2 B3 B4 Total 

A1 0.06(fi1) 0.14(fi2) 0.16(fi3) 0.04( fia) 0.40( fi.) 
A2 0.17(fo1) 0.05(f22) 0.08(.f23) 0.10( f24) 0.40( f.) 
A3 0.07( f31) 0.06( f32) 0.06( f33) 0.01(f34) 0.20(f3.) 
Total 0.30(f1) 025(f2 030(f3  0.I5(f4) 1.00( f.) 


The relative frequencies are denoted in parentheses by f;; where the summation with 
respect to a subscript is designated by a dot, that is, fj, = >> jfi fi = У fij and 
fom. j fij. Note that f.. = 1. In a general notation, a two-way contingency table 
and the corresponding relative frequencies are displayed as follows (Table 15.4.3): 
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Table 15.4.3: A two-way contingency table and a table of relative frequencies 


4> Ву B B; Total 4> B Bo ::: B; Total 
Ai |м иро ce | MMs nı. Ay | fir Л?» А5 Л. 
Аз | nai | "22 |> | Nas n2. Аз | fai | fa = | fas h. 
A; Пу] ny? iti Nrs Ny, A, fa fa Wh frs fr. 
Total | ni | na | --- {пу n =n Total | fi | f2 =- | fs | fi=l 


Letting the true probability of the occurrence of an observation in the (i, /)-th cell be 
Pij, the following is the table of true probabilities: 


Table 15.4.4: True probabilities p;; in a two-way contingency table 


| Bi Bo ee В, Total 
A1 pu P12 pis Pis pi. 
А? P21 P22 ih P2s pa. 

А, Pri Pr2 di Prs Pr. 
Total рл pa eus D.s р.. = 1 


These are multinomial probabilities and, in this case, the n;;’s become multinomial 
variables. An estimate of р;;, denoted by Pij, is Pij = fij, the corresponding relative 
frequency. The marginal sums in Table 15.4.4 can be interpreted as follows: ру = the 
probability of finding an item in the first row or the probability of an event will have the 
attribute Aj; p.; = the probability that an event will have the characteristic B;, and so on. 
Thus, 

Pij = fij = Pij Di. = mm Pj = m i= 1; 85 Ј = П 

п п п 

If A; апа В; are respectively interpreted as the event that an observation will belong to 
the i-th row or the event of the occurrence of the characteristic А;, and the event that an 
observation will belong to the j-th column or the event of the occurrence of the attribute 
Bj, and if we let pi. = P(Ai) and р.ј = P(Bj), then Pij = P(A; N Bj), where Р(А;) 
is the probability of the event A;, P(B;) is the probability of ће event B;, and (А; П 
Bj) is the intersection or joint occurrence of the events A; and B;. If A; and B; are 
independent events, P(A; П Bj) = P(Aj)P(Bj) or pij = pi.p.j, the product of the 
marginal probabilities or the marginal totals in the table of probabilities. That is, 
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x Ni.\ (^j ninj 
Р(А N B) = P(A) P(B) => py = pipa By = (5) (2) = = 0540 


for all i and j. In a multinomial distribution, the expected frequency in the (i, j)-th cell 
is пру where n is the total frequency. Then, the expected frequency, denoted by E[-], the 
maximum likelihood estimate (MLE) of the expected frequency, denoted by E[-], and the 
MLE of the expected frequency under the hypothesis H, of independence of events A; and 
В ;, are the following: 


^ " nij А "e ni N (nj ninj 
E[nij] = пр, E[nij] = пру = n(—), npij|Ho = npi,p.j = 263 (=) 2—6, 


п п п 

(15.4.2) 
Now, referring to our numerical example апа the first row of Table 15.4.1, the estimated 
expected frequencies, under Н, are: E[nj;|H,] = == = 20530 = 12, E[nji2| Hj] = 


"1лз = UX = 10, Е[п1з|НЬ] = 2090 = 12, E[ni4|Ho] = 2 = 6. All the 
estimated expected frequencies are shown in parentheses next to the observed frequencies 


in Table 15.4.5: 


Table 15.4.5: A two-way contingency table 


{ > By Bo В» ВА Total 

A1 6(12) 14(10) 16(12) 4(6) 40(40) 
A2 17(12) 5(10) 8(12) 10(6) 40(40) 
A3 7(6) 6(5) 6(6) 1(3) 20(20) 
Total 30(30) 25(25) 30(30) 15(15) 100(100) 


15.4.2. Some general computations 


Let J, and J, be respectively r x 1 and s x 1 vectors of unities and P be the true 
probability matrix, that is, 


1 р! Pie 3 Dig 
fa. ea а E (15.4.3) 
l 1 Pri Proc Prs 
Letting the marginal totals be denoted by R and C’, we have 
Pi. 
R = PJ; = x ts pos uis (15.4.4) 


Pr. 


868 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Referring to the initial numerical example, we have the following: 


hi. ny./n 
„| |л›уп| [40/100] [ол 


R=| |=| [= |40/100 | = | 0.4 


|2, | Ё уп | 20/100 0.2 


^y ^ ^ ^ n ns 
С =[P1, P2,..., Ps] = 9 жй су, 
п п 


= [555 155^ Ta» 169] = 10-30, 0.25, 0.30, 0.15]. 


Writing the bordered matrix P as 


pu pi2 c: pis Pi, 
P21 P22 `` pos P2. 
P R 
: : ш : : = B | , (15.4.5) 
Pri Pr2 c^ Prs Pr. 
PA P2 > Ps d 


in the numerical example, these quantities are 


0.06 0.14 0.16 0.04 0.40 
Р R] |017 0.05 0.08 0.10 0.40 
| |= 0.07 0.06 0.06 0.01 0.20 
0.30 0.25 0.30 0.15 1.00 


Let D, and De be the following diagonal matrices corresponding respectively to the row 
and column marginal probabilities: 


рі. 0 ЕЯ 0 PA 0 T 0 
pup. cer epu e gr 
(кй. Фе E es 
D, = diag(pi., pa. .... pr), Dc = diag(p.1, pa... P.s). (15.4.6) 


In the numerical example, these quantities are 


D, = diag(0.4,0.4,0.2) and D, = diag(0.30, 0.25, 0.30, 0.15). 


Cluster Analysis and Correspondence Analysis 869 
Now, consider D7 ! P and PD. E 


pu ро ,,, Pis Pil 


ELL В! | 
р. Pi. pi. 1 Dj. 
pn po ... PA R! Pj 

-1 рз. p. pa. | _ 2 p 

D, P = f | : = , Rj = кр (15.4.7) 
Pri Р? ,,, Prs R’ Pjs 
Pr. Pr. Pr. E Pj. 
pu рю ,..,2 Pi pu 
PA P2 P.s P.j 
pn po ,... P2 p 

—1 PA р.2 P.s L P.j 

PD, = : : | * ECC; С, = id : (15.4.8) 
ры po |, pr FA 
PA pa P.s Dj 


Referring to the numerical example, we have 


п/п. ni2/ni. c njis/nj, 
A da nji/n2, n22/n2, +++ поу/пә, 6/40 14/40 16/40 4/40 
Bop = | | | = | 17/40 5/40 8/40 10/40 
: : : 7/20 6/20 6/20 1/20 
п,1/п,, ny2/ ny. Vt пу | Nr, 
. . 1 
D;!bBJ,—|1 
1 
nyj/nj ny2/n2 +++ п/п 
A Ad nj/n;j no2/no ++: Nos/Ns 6/30 14/25 16/30 4/15 
PD! = | | | = | 17/30 5/25 8/30 10/15 
: : 7/30 6/25 6/30 1/15 
пк\/пу Nr2/N2 +++ п/п 
J Ê Ô! = 1,1, 1, 1]. 


For computing the test statistics in vector/matrix notation, we need (15.4.7) and (15.4.8). 
15.5. Various Representations of Pearson's x? Statistic 


Now, let us consider Pearson's x? statistic for testing the hypothesis that there is no 
association between the two characteristics of classification or the hypothesis Н, : pij = 
Di.p.j. The X? statistic is the following: 
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(observed frequency — expected frequenc y (nij — mal)? 
х= У q y р q y =D a j = (15.5.1) 
F (expected frequency) = 
nij п. 1.) у? THERE ERA 

Hye оу шо (15.5.2) 
T ues = Pi.P.j 
1] n n 1] 
r M Di i " 2 . 

= жу, (Ё - Bj) fi; (15.5.3) 


- Y; Ё - 22A | (15.5.4) 
: 2. |р, 


In order to simplify the notation, we shall omit placing а hat on top of the estimates of 
Ri, Cj, R, C, D. and D,. We may then express the х? statistic as the following quadratic 
forms: 


2— У mpi (Ri — CY D; (R; — С) (15.5.5) 
i=l 

= $ np j(Cj — RD; (C; — R). (15.5.6) 
je 


The forms given in (15.5.5) and (15.5.6) are very convenient for extending the theory to 
multi-way classifications. 

It is well known that, under Н, Pearson’s x? statistic is asymptotically distributed as a 
chisquare random variable having (r — 1)(s — 1) degrees of freedom as n — oo. One can 
also express (15.4.8) as a generalized distance between the observed frequencies and the 
expected frequencies, which is a quadratic form involving the inverse of the true covariance 
matrix of the multinomial distribution of the n;;'s. Then, on applying the multivariate 
version of the central limit theorem, it can be established that, as n — oo, Pearson's x? 
statistic has a x? distribution with (r — 1)(s — 1) degrees of freedom. For the representation 
of Pearson's x? goodness-of-fit statistic as a generalized distance and as a quadratic form, 
and for the proof of its asymptotic distribution, the reader may refer to Mathai and Haubold 
(2017). There exist other derivations of this result in the literature. 

The quadratic forms specified in (15.5.5) and (15.5.6) can also be interpreted as com- 
paring the generalized distance between the vectors R; and C in (15.5.5) and between the 
vectors C; and R in (15.5.6), respectively. These will also be equivalent to testing the hy- 
pothesis Ho : pij = pi.p.j. AS well, an interpretation can be provided in terms of profile 
analysis: then, the test will correspond to testing the hypothesis that the weighted row pro- 
files are similar; analogously, using (15.5.6) corresponds to testing the hypothesis that the 
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column profiles in a two-way contingency table are similar. Now, examine the following 
item: 


Pil Pi2 +: Pis Pi. 
P21 P22 ``‘ PRs P2. 
P — КС' = : : T : ET : [DAD ov cs P.s] 
Pri Pr2 c^ Prs Pr. 
P11 PLPA pi — р1.р2 coc pis — р1.р.ѕ 
P21 — papi po2— p2.P2 cc: pos — рә.р.ѕ 
Prs — Pr.PA Pr2— Pr.P2 coc Prs — Pr Pss 
Referring to our numerical example, these quantities are the following: 
niji nini Nis NIN s 
| u m A d | 
Р — ВС' = : : = — x 
Пу] Np NY Ars Ar. N.s 100 
n nm P" n m 


6 —(40)(30)/100 14 — (40)25)/100 16 — (40)(30)/100 4 — (40)(15)/100 
17 — (40)(30)/100 5— (40)(25)/100 8— (40)(30)/100 10 — (40)(15)/100 
7 — (20)(30)/100 6— (20)(25)/100 6— (20)(30)/100 1— (20)(15)/100 


26: E ү; 
=| 5-5-4 4 
jo de ee» 


15.5.1. Testing the hypothesis of no association in a two-way contingency table 


The observed value of Pearson's x? statistic is 
"IN [ (5)? e] x 4» (5? (1)? 
ЕТИ qoe ve 10 10 "35 
4)2 _4)2 2 _9)2 2 _9)2 
JS (—4) Е ) EP. 


e p g Л 3 


= 16.88. 


Given our data, (r — 1)(s — 1) = (2)(3) = 6, and the tabulated critical value is X$.0.05 = 
12.59 at the 5% significance level. Since 12.59 < 16.88, the hypothesis of no association 
between the classification attributes is rejected as per the evidence provided by the data. 
This x? approximation may be questionable since one of the expected cell frequencies is 
less that 5. For a proper application of this approximation, the cell frequencies ought to be 
at least 5. 
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15.6. Plot of Row and Column Profiles 


Now, (P — RC^)D;! means that the columns of (P — RC’) are multiplied by 


| М 25 respectively. Then, (P — RC')D;!(P — RC) is a matrix of all square and 


pa et Р 
cross product terms involving pi; — pi. р.у for alli and j, where the s columns are weighted 
by s and if pre-multiplied by D7!, the rows are weighted by oe ы A respectively. 
Looking at the diagonal elements, we note that Pearson's x? statistic is nothing but 
x? 2nu[D; (P — RC)D; (P — RC] (15.6.1) 
к; 
T Pi.P.j 
Ј 
— п ++ М) (15.6.3) 
where x, EO А2 аге the nonzero eigenvalues of the matrix D7! (P— RC D7! (P—RC^y 
_1 EN 
or of the matrix D, 7(P — RC) D; ! (P — RC’) D, ? with k being the rank of P — RC’. For 
the numerical example, the observed value of the matrix Y = (уу) with yjj = рр? 


is obtained as follows, observing that 
Li = lu — 2 | / ninm. 


From the representation of P — RC , we already have the matrix n;; — nj.n_;/n, that is, 


(nij — nin з/п) 

Jni.n.j/n 

(6—12)//12. (14—10)//10 (16—12/4/12 (4—6/4/6 
= |(17—12)/412 (5—10)//10 (8—12/412 (10—6)/4/6 
0—-6//6 | (6-5//5 | (6—6//6  0—3/43 
—6//12 4/4/10 4/412 —2/46 
5/4/12. —5/4/10 —4/412 4/46 
l//V6 105 0/06 -2/43 


Then, 


33 43. 175 
5 6 30 
пҮҮ' = | £5 103 1742 | (15.6.4) 
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The representation in (15.6.1) has the advantage that 


Ш 1 
tr[D, ? (P — RC)D; (P — ВС” D; °] 


-1 zd 
—-u[YY'] Y =D, ?(Р – RC)D.* = (is) 


Pij — Pi.P.j ^2 2 oor 
yy = ===. ny; = x^ =ntr(YY’). (15.6.5) 
p J Pi.P.j 2 M 


Note that Y is r x s and the rank of Y is equal to the rank of P — RC’, which is k, referring 
to (15.6.3). Thus, there are k nonzero eigenvalues associated with the r x r matrix Y Y' as 
well as with the s x s matrix Y’Y, which are АД, TM М. Since tr(YY’) = AG dass А, 
we can represent Pearson's x? statistic as follows, substituting the estimates of pij, pi. 
and р.у, etc: 


x? 
2 (УУ) = А2 +... +22 
п 
k 
=) ji (Ri - CY D; (Ri - С) (15.6.6) 
i=l 
-Y 546; - YD; Ô; – В). (15.6.7) 
j=l 


The expressions given in (15.6.6) and (15.6.7) and the sum of the А7 are called the total 
inertia in a two-way contingency table. We can also define the squared distance between 
two rows as 


dij, = (Ri — Ку) D; (Ri — Rj) (15.6.8) 


and the squared distance between two columns as 


d? = (Ci — Cj! D; (C; – Су). (15.6.9) 
When the distance as specified in (15.6.8) is very small, we may combine the i-th and 
j-th rows, if necessary. Sometimes, the cell frequencies are small and we may wish to 
combine the small frequencies with other cell frequencies so that the x? approximation 
of Pearson's x? statistic be more accurate. Then, one can rely on (15.6.8) and (15.6.9) to 
determine whether it is indicated to combine rows and columns. 

For convenience, letr < s. Let U1, ..., U, be the r x 1 normalized eigenvectors of Y Y’ 
and let the r x k matrix U = [U1, U2, ..., Ux], k € r. Let Vj,..., V; be the normalized 
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eigenvectors of Y'Y and let the r x k matrix V = [Vi, ..., Vk], k < s. Now, consider the 
singular value decomposition 


al =l 
Y = D,?(P — RC)D,? = ОЛУ! (15.6.10) 


where U U' = Ig = V'V and A = diag(A1, ..., Ax). Then, we can write 
l l 
P — RC' = DIU AV'D2 = WAZ’ (15.6.11) 


1 1 
where W = D;U and Z = ГРУ. Let Wj, j = 1,...,k, denote the columns 
of W = [Wi, Wo,..., Wy] and let Zj, j = 1,...,k, denote the columns of Z = 
[41, Z2,..., Zk]. Then, we can write 


k 
P— ЕС = у А27 (15.6.12) 
j=l 


where W'D-!W = U'U = k = V'V = Z'DĮ!Z. Note that P — RC’ is the deviation 
matrix under the hypothesis Н, : pij = pi. pj ог 


2l _1 ПОТЕ S Е 
P — RC' = (pij — р.р ;) and Y = (у) = D, (Р — RC')D,? = (= Pet) 
J Pi.P.j 


Thus, the procedure is as follows: If r < s, then compute the r eigenvalues of the r x r 
matrix YY’. If Y is of rank r, YY’ > O (positive definite), otherwise YY’ is positive 
semi-definite. Let the nonzero eigenvalues of YY’ be АЛ, ET As assuming that k is the 
number of nonzero eigenvalues of YY’. These will also be the nonzero eigenvalues of 
Y'Y. Compute the normalized eigenvectors from Y Y' and denote those corresponding to 


the nonzero eigenvalues by U = [U], ..., Ux] where U; is the j-th column of U. Letting 
the normalized eigenvectors obtained from Y'Y, which correspond to the same nonzero 
eigenvalues, be denoted by V = [Vi,..., Vi], we have 


Y = ОЛУ’, A=diag(Ay,...,A%), YY' = UA?U' and Y'Y = V A?V'.. (15.6.13) 


Example 15.6.1. Construct a singular value decomposition of the following matrix Q: 


-1 1-10 
ds ai 
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Solution 15.6.1. Let us compute Q Q' as well as О” О and the eigenvalues of QQ’. Since 
—1 1 
ЕЕ 11| [3 0 
ges КЛ a 
0 2 


the eigenvalues of QQ’ are А = 3 and А = 6. Let us determine the normalized eigen- 
vectors of QQ’. Consider the equation [Q Q" — AI]X = О for à = 3 and 6, and let 
X' = [x], x2] and О” = [0, 0]. Then, for à = 3, we see that x? = О and for A = 6, we note 
that x, = 0. Thus, the normalized solutions are 


ZESH апа U? = H => U = Iu. = |, {| 


Note that —U; or —U2 or —U,, —U» will also satisfy all the conditions, and we could take 
any of these forms for convenience. Now, consider the equation (Q'O — AI)X = O, 
where X’ = [x], х2, хз, x4] and О’ = [0, 0, 0, 0] for A = 3, 6. For à = 3, the coefficient 
matrix is 


Е 0 T E 0 1 

‚ л ae 0 —1 —1 

Qu ore с шогу р 3s dou 
2 


2. 0i | 0 0 0 


о © rr 


by elementary transformations. Observe that хд = 0 so that —x; + x3 = 0 and —x2 — x3 = 
0. Thus, one solution or an eigenvector corresponding to A = 3 and their normalized form 
are 


1 
—1 1 | -1 
0 0 


Now, take A = 6 and consider the equation (Q’Q — 6/)X = O; the coefficient matrix and 
its reduced form obtained through elementary transformations are the following: 


=й. аф. Ae д рет э) 
(Pa = 69 (yd^ n2 
1-1 -5 0[7 |0 0—21 0l 
De ОУ суд 0 0 90 
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which shows that хз = 0, so that x; — x? = 0 and —4x2 + 2x4 = 0. Hence, an eigenvector 
and its normalized form are 


> V= 


1 

1 |1 
V6 | 9 
2 


юсю кюе 


Thus, У = [Vj, V2]. As mentioned earlier, we could have —У or — № or — V1, — V2 as the 
normalized eigenvectors. As per our notation, 


A = diag(V3, V6) and Q = ОЛУ". 
Let us verify this last equality. Since 


1 1 1 
pata’ ПЕ" Uae. Sm 0 
о ajlo vél = + 2 


[i —1 10 
~ 41 1 0 2’ 
we should take — V; to obtain О. Then, 
v3 0 ][-Vv/ 
О, U 1 |= 0, 
[Uj aly Jf v? Q 


which verifies the result and completes the computations. 


Now, we shall continue with our row and column profile plots. From (15.6.4), we have 


63 _ 47/30 _u 74/2 
33 | 43 17/2 12 60 3 3 
5 6 Da __ 474/30 43 34/30 _ 16/15 
‚_ | _43 103 174/2 ly — 60 10 5 15 
nYY —$ 1 and nY Y di 3/30 : E 
17/2 _17/2 17 3 5 3 
30 12 10 1/2 _ 16/15 -2/2 14 
3 15 3 


The eigenvalues of nY Y' are Ay = 15.1369, А = 1.7471 and лз = 0 and the normalized 
eigenvectors from nY Y', corresponding to A1, А and Аз are U1, U2 , Us, so that U = 
[U1, U2, U3] where 


4.28 —0.49 1.41 
U = | —4.99 —0.22 1.41 | and A = diag(4/15.1369, V 1.7471, 0). 
1 1 1 
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For the same eigenvalues Л, A» and A5, the normalized eigenvectors determined from 
nY'Y, which correspond to the nonzero eigenvalues, are V; and V2, with 


1.10 —0.84 | 


—1.06 —0.16 
mii | —0.83 0.28 


i dj 
- 
Since Аз = 0, k = 2, and we can take the r х k, that is, 3 x 2 matrix С = (gij) = D, СА 
to represent the row deviation profiles and the 5 х k = 4 х 2 matrix Н = (hjj) = 


—1 
D, ? V A to represent the column deviation profiles. For our numerical example, it follows 
from (15.4.6) that 


| ЭИ К E. 
D, = diag(0.4, 0.4.0.2) > D; ? = sae E 2) 
1 


E 
D, = diag(0.30, 0.25, 0.30, 0.15) => D7 =d PEL 
=ош Nee тесеу 55° 0.50’ 0.55’ 0.39 


A = diag(v 15.1369, v 1.7471, 0) = diag(3.89, 1.32, 0). 


We only take the first two columns of U and V since Аз = 0; besides, only the first two 
vectors are required for plotting. Let О) and Va) represent the first two columns of U 
1 


) 


and V, respectively. Then, D, ? (у A will be equivalent to multiplying the first and second 
columns by 3.89 and 1.32, respectively, and multiplying the first and second rows by UE 
and the third row by 715. Then, we have 


4.28 —0.49 А 26.42 – 1.03 
Ua = | —499 —0.22 |, D, Gaya = | —30.81 —0.46 | = Сб 
1 1 6.17 2.09 


where G2 is the matrix consisting of the first two columns of С. Hence, ће points re- 

quired for plotting the row profile are: (26.42, —1.03), (—30.81, —0.46), (6.17, 2.09). 

These points being far apart, no two rows should be combined. Now, consider the col- 
1 


umn profiles: the effect of De ? VA is to multiply the columns of Va) by 3.89 and 1.32, 
respectively, and to multiply the rows by 755 3 757; respectively. Thus, 


0.55? 0.39? 
110 —0.84 7.78 —2.02 
_ | 1.06 —0.16 a | —825 -042 | _ 
Y=). аз ong |# = "OS =) сет 967 у= 


1 1 9.97 3.36 
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where Н» is the matrix consisting of the first two columns of H. The row profile and the 
column profile points are plotted in Fig. 15.6.1 where r next to a point indicates a row 
point and c designates a column point. That 15, i r indicates the i-th row point and j c, the 
j-th column point. It can be seen from this plot that the row points are far apart while the 
second and third column points are somewhat close; accordingly, if necessary, the second 
and third columns could be combined. 


a 
з! 4с 
3r 
2} ө 
1+ 
-30 —20 -10 10 20 
e LJ 
2r 2C m 
S oed 1ге 
-2r А 
1с 


Figure 15.6.1 Row profile and column profile points 


15.7. Correspondence Analysis in a Multi-way Contingency Table 


When the data is classified under a number of variables, each variable having a num- 
ber of categories, the resulting frequency table is referred to as a multi-way classification. 
Correspondence analysis for a multi-way classification involves converting data in a multi- 
way classification setting into a two-way classification framework and then, employing the 
techniques developed in Sects. 15.5 and 15.6. The first step in this regard consists of creat- 
ing an indicator matrix C. In order to illustrate the steps, we will first present an example. 
Suppose that 10 persons selected at random from a community, are classified according 
to three variables. Variable 1 is gender. Under this variable, we shall consider the cate- 
gories male and female. Variable 2 is weight. Under this variable, we are considering three 
categories: underweight, normal and overweight. The third variable is education which 
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is assumed to have four levels: level 1, level 2, level 3 and level 4. Thus, there are three 
variables and 9 categories. The actual data are provided in Table 15.7.1. 


Table 15.7.1: Ten persons classified under three variables 


Variables 
Person # Gender Weight Educational level 
1 Female Overweight Level 2 
2 Female Normal Level 4 
3 Male Underweight Level 1 
4 Female Normal Level 3 
5 Male Overweight Level 1 
6 Male Normal Level 2 
7 Female Overweight Level 3 
8 Female Underweight Level 4 
9 Male Normal Level 3 
10 Female Overweight Level 1 


Table 15.7.2: Entries of the indicator matrix of the data included in Table 15.7.1 


Variables 

Gender | Weight Educational level 
Person # M F U N O L1 L2 L3 LA 
1 0 1 0 0 1 0 1 0 0 
2 0 1 0 1 0 0 0 0 1 
3 1 0 1 0 0 1 0 0 0 
4 0 1 0 1 0 0 0 1 0 
5 1 0 0 0 1 1 0 0 0 
6 1 0 0 1 0 0 1 0 0 
7 0 1 1 0 0 0 0 1 0 
8 0 1 1 0 0 0 0 0 1 
9 1 0 0 1 0 0 0 1 0 
10 0 1 0 0 1 1 0 0 0 


880 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Next, we construct the indicator matrix C—distinct from C as defined in (15.4.4)—of 
the data displayed in Table 15.7.1. If an item is present, we write 1 in the corresponding 
location in Table 15.7.2, and if it is absent, we write O, thus populating this table where 
M = Male, Е = Female, О = underweight, М = Normal, О = overweight, L1 = Level 1, 
L2 = Level 2, L3 = Level 3 and L4 = Level 4. The resulting indicator matrix C is 


01 001 0100 
01 0100001 
10 1001000 
01 0100010 
c-|190 0011000 
10 010 010 0 
01 001 0010 
01 1000001 
100100010 
01 001 1000 


Note that since a person will belong to a single category of every variable, the row sum 
of every row will always be equal to the number of variables, which is 3 in the example. 
The sum of all the column entries under each variable is the number of items classified (10 
in the example). We now convert the data into a two-way classification, which is achieved 
by converting C into a Burt matrix B, where B — C'C. In our example, 


401212110 
061231122 
11200100 1 
220400121 
B=C’C=|130042141 0 
211023000 
110110200 
120210030 
021100002 


Observe that the diagonal blocks in C'C correspond to the variables, gender, weight and 
educational level or gender versus gender, weight versus weight, educational level versus 
educational level. These blocks are the following: 


EES 
о о 
оом 
о ьо 
Б о о 
ооо оо 
оомо 
ошооо 
ооо 
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Various two-way contingency tables, namely gender versus weight, gender versus educa- 
tional level, weight versus educational level, are combined into one two-way table dis- 
playing category versus category. The observed Pearson’s x? statistic from C’C is seen to 
be 79.85. In this case, the number of degrees of freedom is 8 x 8 = 64 and at 5% level, 
the tabulated X64,0.05 ~ 84 > 79.85; hence, the hypothesis of no association in C'C is 
not rejected. Note that this x? approximation is unreliable since the expected frequencies 
are small. The most relevant parts in the Burt matrix C'C are the non-diagonal blocks of 
frequencies. The two non-diagonal blocks of the first two rows represent the two-way con- 
tingency tables for gender versus weight and gender versus educational level. Similarly, 
the non-diagonal block in the third to fifth rows represent the two-way contingency table 
for weight versus educational level. These are the following, denoted by A1, A2, Аз re- 
spectively, where A, is the two-way contingency table of gender versus weight, A» is the 
contingency table of gender versus educational level and A3 is the table of weight versus 
educational level: 


1001 
ee е 
2110 


The corresponding matrices of expected frequencies, under the hypothesis of по associa- 
tion between the characteristics of classification, denoted by E(A;), i = 1, 2,3 are 


0.8 1.6 1.6 1.2 08 12 08 
EA) = [03 24 i EA) = [1% 1.2 1.8 B 


0.6 0.4 0.6 0.4 
Е(Аз) = |12 0.8 1.2 0.8 
1.2 0.8 1.2 0.8 


The observed values of Pearson’s x? statistic under ће hypothesis of no association in 
the contingency table, and the corresponding tabulaled x? critical values at the 5% signif- 
icance level, are the following: A1 : x? = 0.63, X2 0.03 = 5.99 > 0.68; Ar: х? = 
2.36, X$o95 = 7.81 > 2.36; Аз: x? = 542, xenos = 12.59 > 5.42; hence the 
hypothesis would not be rejected in any of the contingency table if Pearson’s statistic were 
applicable. Actually, the x? approximation is not appropriate in any of these cases since 
the expected frequencies are quite small. Hence, decisions cannot be made on the basis of 
Pearson's statistic in these instances. 


Observe that the first column of the matrix C corresponds to the count on “Male”, the 
second to the count on “Female”, the third to “Underweight”, the fourth to “Normal”, the 
fifth to *Overweight", the sixth to “Level 1", the seventh to “Level 2", the eighth to “Level 
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3” and the ninth to “Level 4”. Thus, the columns represent the various characteristics or 
the various variables and their categories. So, if we were to plot one column as one point in 
the two-dimensional space, then by looking at the points we could determine which points 
are close to each other. For example, if the “Overweight” column point is close to the 
“Male” column point, then there is possibility of association between “Overweight” and 
“Male”. Thus, our aim will be to plot each column of С or each column of C'C as a point 
in two dimensions. For this purpose, we may make use of the plotting technique described 
in Sects. 15.5 and 15.6. Consider a singular value decomposition of C = U AV’, U'U = 
IL, МУ =i. Е Cisr xs, s <r, thenU isr x Капа V is s x k where К is the number 
of nonzero eigenvalues of CC’ as well as those of C'C, and A = diag(A1, ..., А) where 
А5, j = 1,..., К, are the nonzero eigenvalues of CC’ and C’C. In the numerical example, 
r = 10 and s = 9. Consider the eigenvalues of C'C since in this case, the order is smaller 
than the order of CC’. Let the nonzero eigenvalues of C'C be A >... > А: Егот 
С'С, compute the normalized eigenvectors corresponding to these nonzero eigenvalues. 
This s x k matrix of normalized eigenvectors is V in the singular value decomposition. By 
using the same nonzero eigenvalues, compute the normalized eigenvectors from CC’. This 
r x k matrix is U in the singular value decomposition. Since the columns of C and C'C 
represent the various variables and their subdivisions, only the columns are useful for our 
geometrical representation, that is, only V will be relevant for plotting the points. Consider 
Н = V A and let Aj > A2 >... > AZ. Observe that C = UAV’ > C' = VAU' = НО". 
The rows of C’ represent the various variables and their categories. Let h1,..., hs be the 
rows of H. Then, we have 


hiU’ = Men-row 


h4U' = Women-row 


h,U' = Level 4-row. 


This shows that the rows h1, ..., ^; represent the various variables and their categories. 
Since the first two eigenvalues аге the largest ones and Vj, № are the corresponding eigen- 
vectors, we can take it for granted that most of the information about the various variables 
and their categories is contained in the first two elements in h,,..., As or in the first two 
columns weighted by A; and Az. Accordingly, take the first two columns from H and 
denote this submatrix by Ho) where 


hy hy 


һә hn 
AQ) = | 


hsi һу? 
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Plot the points (h11, h12), (h21, h22), ..., (As1, һә). These s points correspond to the s 
columns in the r x s matrix C or the s rows in C’. 


Referring to our numerical example, the eigenvalues are 
А = 11.66, A = 5.57, А5 = 5.28, AZ = 3.47, А5 = 2.34, 
м = 1,14, АЗ = 0.54, Àg = ìo =0, 
so that К = 7 and the nonzero eigenvalues, "m ype dua abe 
Ay = 3.41, А = 2.36, Аз = 2.30, Ал = 1.86, 45 = 1.53, А = 1.07, Аз = 0.73. 
Thus, the matrix A is 
A — diag(3.41, 2.36, 2.30, 1.86, 1.53, 1.07, 0.73), 
and the 
total inertia = 11.66 + 5.57 + 5.28 + 3.47 + 2.34 + 1.14 + 0.54 = 30 = tr(C'C). 


Noting that I — 0.39 and (hoot T) — 0.75, we can assert that 7596 of the inertia 
is accounted for by the first three eigenvalues of C'C. 


The normalized eigenvectors of C'C, which correspond to the nonzero eigenvalues and 
are denoted by V = [Vi,..., V7], are the following: 


0.293826 —0.711194 0.123362 0.0732966 
| 0.615045 | | 0.512432 | | —0.0941084 | | 0.107206 | 
0.138401 —0.0995834 —0.0879546 0.601988 
0.36352 | | —0.106977 | 0.638327 | —0.03252 
Vi=] 04069051 |, V2=] 0.00779839 |, v3=| —0.521119 | у= | —0.388966 |, 
0.248836 —0.386116 —0.428293 0.167134 
| 0.173839 | | —0.0833586 | | 0.0446188 | | —0.164402. | 
| 0.306906 | | 0.041766 | 0.302598 | —0.357001 | 
0.179291 0.228947 0.110329 0.534772 
0.0406003 —0.115587 0.320998 
| 0.0238456 | | —0.0128351 | | —0.262335 | 
| —0.124608 | | —0.572877 | | —0.162227 | 
0.110633 0.408319 —0.195339 
у; = | 00784214 |, V 2 | 00361355 |, Vj 2 | 0.416228 |. 
| —0.206691 | | 0.400243 | | —0.427087 | 
| 0.754879 | | —0.367304 | | —0.191702 | 
fe | | —0.382451 | | 0.0724589 | 
0.1004 0.22109 0.604993 
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Then, the first two eigenvectors weighted by à; and Ао and the points to be plotted are 


0.293826 1.00333 
0.615045 2.10021 
0.138401 0.4726 
0.36352 1.24132 
A1V1 = 3.41472 | 0.406951 | = 1.38962 |, 
0.248836 0.849704 
0.173839 0.593611 
0.306906 1.048 
0.179291 0.612228 
—0.711194 —1.67911 
0.512432 1.20984 
—0.0995834 —0.235114 
—0.106977 —0.25257 
№№ = 2.36098 | 0.00779839 | = 0.0184118 |; 
—0.386116 —0.911613 
—0.0833586 —0.196808 
0.041766 0.0986087 
0.228947 0.540539 
(1.00333, —1.67911) Men 
(2.10021, 1.20984) Women 
(0.4726, —0.235114) Underweight 
(1.24132, —0.25257) Normal 
Points to be plotted : (1.38962, 0.0184118) | < Overweight 
(0.849704, —0.911613) Level 1 
(0.593611, —0.196808) Level 2 
(1.048, 0.0986087) Level 3 
| (0.612228, 0.540539) Level 4 | 


The plot of these points is displayed in Fig. 15.7.1. 


It is seen from the points plotted in Fig. 15.7.1 that the categories underweight and 
educational level 2 are somewhat close to each other, which is indicative of a possible 
association, whereas the categories underweight and women are the farthest apart. 
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Figure 15.7.1 Multiple contingency plot 
Exercises 15 (continued) 


15.6. In the following two-way contingency table, where the entries in the cells are 
frequencies, (1) calculate Pearson’s x? statistic and give the representations in (15.5.1)— 
(15.5.6); (2) plot the row profiles; (3) plot the column profiles: 


4— Bı Bo Вз Ba 
A; 10 15 20 15 
Аз 15 10 10 5 


15.7. Repeat Exercise 15.6 for the following two-way contingency table: 


| Bı Bo Вз B4 
A; 10 5 15 5 
Ar 5 10 10 20 
Аз 10 5 10 5 
A4 15 10 5 10 


15.8. For the data in (1) Exercise 15.6, (2) Exercise 15.7, and by using the nota- 
tions defined in Sects. 15.5 and 15. 6, compute the following items: Estimates of (1) A = 


D {p= RC’) D7 '(P—RC’)'D, 2, ; Gi) Eigenvalues of A and (А); (iii) Total inertia and 
proportions of inertia accounted for by the eigenvalues; (iv) The matrix of row-profiles; 
(v) The matrix of column-profiles, and make comments. 


15.9. Referring to Exercises 15.6 and 15.7, plot the row profiles and column profiles and 
make comments. 
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15.10. In a used car lot, there are high price, average price and low price cars, the cars 
come in the following colors: red, white, blue and silver, and the paint finish is either mat 
or shiny. Fourteen customers bought vehicles from this car lot. Their preferences are given 
next. (1) Carry out a multiple correspondence analysis, plot the column profiles and make 
comments; (2) Create individual two-way contingency tables, analyze these tables and 
make comments. The following is the data where the first column indicates the customer's 
serial number: 


1 Low price white color mat finish 
2 Low price red color shiny finish 
3 Average price silver color shiny finish 
4 High price red color shiny finish 
5 High price blue color shiny finish 
6 Average price white color mat finish 
7 Ауегаре рпсе blue color mat finish 
8 High price blue color shiny finish 
9 High price red color mat finish 
10 Average price silver color mat finish 
11 Low price white color shiny finish 
12 Ауегаре рпсе white color mat finish 
13 Ауегаре рпсе silver color shiny finish 
14 Low price white color shiny finish 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Tables of Percentage Points 


Tables 1, 2, 3, 4, 5, 6 and 7 contain probabilities and percentage points that are useful for 
testing a variety of statistical hypotheses encountered in multivariate analysis. 


dt 


part? 
Table 1: Standard normal probabilities. Table entry: probability = IN : = 


x 000 (0.01 002 003 004 005 006 007 008 0.09 

0.0 0.0000 |0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 
0.2 0.0793 10.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 
0.4 0.1554 0.1591 0.1627 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 
0.6 0.2257 | 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 | 0.2486 10.2518 0.2549 
0.7 0.2580 0.2612 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 
0.8 0.2882 0.2910 0.2939 0.2967 0.2996 0.3023 0.3051 0.3079 0.3106 0.3133 
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3290 0.3315 0.3340 0.3365 0.3389 
1.0 0.3414 | 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 
1.2 0.3849 | 0.3888 0.3888 0.3906 0.3925 0.3943 0.3962 0.3980 0.3997 0.4015 
1.3 0.4032 10.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4146 0.4162 0.4177 
1.4 0.4192 10.4207 0.4222 0.4236 0.4251 0.4265 0.4278 0.4292 0.4306 0.4319 


(continued) 


© The Author(s) 2022 887 
A. М. Mathai et al., Multivariate Statistical Analysis in the Real and Complex Domains, 
https://doi.org/10.1007/978-3-030-95864-0 


888 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Table 1: (continued): Standard normal probabilities 


x 10.00 001 (0.02 003 004 0.05 006 007 0.08 0.09 

1.5 0.4332 0.4345 | 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 
1.6 0.4452 0.4453 0.4474 0.4484 0.4495 0.4505 | 0.4515 0.4525 0.4535 0.4545 
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4610 0.4625 0.4633 
1.8 0.4641 0.4648 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4762 0.4767 
2.0 0.4773 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 
2.2 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4914 0.4916 
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 | 0.4931 0.4933 0.4934 0.4936 
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 
3.0 0.4986 0.4987 0.4987 0.4988 0.4988 0.4988 0.4989 0.4989 0.4990 0.4990 
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Table 2: Student-t, right tail. Table entry: t,4 where у f(t,)dt, = a and f (t) is the density of 
a Student-r distribution having v degrees of freedom — 


v a = 0.10 a = 0.05 a = 0.025 a= 0.01 a = 0.005 
1 3.078 6.314 12.706 31.821 63.657 
2 1.886 2.920 4.303 6.965 9.925 
3 1.638 2.353 3.182 4.541 5.841 
4 1.533 2.132 2.776 3.747 4.604 
5 1.476 2.015 2.571 3.365 4.032 
6 1.440 1.943 2.447 3.143 3707 
7 1.415 1.895 2.365 2.998 3.499 
8 1,397 1.860 2.306 2.896 3.355 
9 1.383 1.633 2.262 2.821 3.250 
10 1.372 1.812 2.228 2.764 3.169 
11 1.363 1.796 2.201 2718 3.106 
12 1.356 1.782 2.179 2.681 3.055 
13 1.350 1771 2.160 2.650 3.012 
14 1.345 1.761 2.145 2.624 2.977 
15 1.341 1.753 2.131 2.602 2.947 
16 1.337 1.746 2.120 2.583 2.921 
17 1.333 1.740 2.110 2.567 2.898 
18 1.330 1.734 2.101 2.552 2.878 
19 1.328 1.729 2.093 2.539 2.661 
20 1.325 1.725 2.086 2.528 2.845 
21 1.323 1.721 2.080 2.518 2.831 
22 1.321 1.717 2.074 2.508 2.819 
23 1.319 1.714 2.069 2.500 2.807 
24 1.318 1.711 2.064 2.492 2.797 
25 1.316 1.708 2.060 2.485 2.787 
26 1.315 1.706 2.056 2.479 2779 
27 1.314 1.703 2.052 2.473 2.771 
28 1.313 1.701 2.048 2.467 2.763 
29 1.311 1.699 2.045 2.462 2.756 
оо 1.282 1.645 1.966 2.326 2.576 
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Table 3: Chisquare, right tail. Table entry: m where Sr f(x)dx? = o with f (x7) being the 
density of a chisquare random variable having v degrees of freedom 


v ja =0.995 0.989 0.975 0.95 0.10 0.05 0.025 0.01 0.005 0.001 


1 0.0000 0.0002 0.0010 0.0039 0.2.71 3.84 5.02 6.63 7.88 10.83 
2 0.0100 00201 0.0506 0.1030 0.4.61 5.99 7.38 9.21 10.60 13.81 
3 10.0717 0.1148 0.2160 0.3520 6.25 7.81 9.35 11.34 12.84 16.27 
4 0.2070 0.2970 0.4844 0.7110 7.78 9.49 11.14 13.26 14.86 18.47 
5 047 0.5543 0.831 1.15 9.24 11.07 12.83 15.09 | 16.75 | 20.52 
6 0.676 0.872. 124 164 (10.64 12.59 14.45 16.81 18.55 2246 
7 |0.989 1.24 1.69 2.17 12.02 | 14.07 16.01 18.48 20.286 24.32 
8 134 1.65 218 2.73  |13.36 |15.51 17.53 |20.09 21.95 26.12 
9 173 209 270 3.33 14.68 116.92 19.02 21.67 23.59 27.88 
10 2.16 2.56 3.25 3.94 15.99 |18.31 20.48 23.21 25.19 29.59 
11 2.60 3.05 3.82 4.57  |17.28 | 19.68 21.92 |24.73 |26.76 |31.26 
12 |3.07 3.57 440 5.23 18.55 121.03 23.34 26.22 28.30 32.91 
13.3.57 4.11 5.01 5.89 19.81 | 22.36 2474 27.69 |29.82 34.53 
14 4.07 4.66 5.63 6.57  |21.06 23.68 26.12 29.14 |31.32 36.12 
15 4.60 523 (626 7.26 |2231 |25.00 |27.49 (30.58 32.80 37.70 
16 5.14 5.81  |6.91 7.96 23.54 26.30 |28.85 32.00 34.27 39.25 
17 5.70 6.41 7.56 8.67 24777 |27.59 |30.19 33.41 |35.72 | 30.79 
18 6.26 7.01 8.23 9.39 25.99 |28.87 31.53 |34.81 3716 | 42.31 
19 6.84 7.63 8.91 10.12 27.20 | 30.14 32.85 36.19 |38.58 | 43.82 
20 | 7.43 8.206 9.59 10.85 28.41 31.41 34.17 37.57 40.00 45.31 
21 8.03 8.50 10.28 11.59 29.62 32.67 35.48 38.93 41.40 46.80 
22 8.64 9.54 10.98 12.34 30.81 33.92 36.78 140.29 42.80 48.27 
23 9.26 10.20 (11.69 13.09 32.01 35.17 38.08 141.64 44.18 49.73 
24 9.89 10.86 1240 13.85 33.20 36.42 39.36 42.98 45.56 51.18 
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Table 3: (continued): Chisquare, right tail 


v (а =0.995 099 0.975 (0.95 010 0.05 10.025 0.01 0.005 0.001 
25 10.52 11.52 13.12 114.61 34.38 37.65 40.65 44.31 46.93 52.62 
26 11.16 12.20 13.84 15.38 35.56 38.89 41.92 45.64 48.29 54.05 
27 111.81 12.88 14.57 16.15 36.74 40.11 43.19 46.96 49.64 55.48 
28 12.46 13.56 | 15.31 16.93 |37.92 41.34 |44.46 |48.28 50.99 |56.89 
29 |13.12 14.26 | 16.05 17.71 |39.09 | 42.56 |45.72 |49.59 52.34 | 58.30 
30 1379 14.95 16.79 18.49 40.26 43.77 46.98 50.89 53.67 59.70 
40 20.71 22.16 24.43 26.51 51.81 55.76 (59.34 63.69 66.77 73.40 
50 |27.99 29.71 32.36 34.76 63.17 67.50 |7142 76.15 79.49 86.66 
60 35.53 37.48 40.48 43.19 74.40 79.08 83.30 88.38 91.95 99.61 
70 43.28 45.44 48.76 51.74 85.53 90.53 95.02 100.4 104.2 112.3 
80 51.17 53.54 57.15 60.39 96.58 101.9 106.6 112.3 116.3 124.8 
90 59.20 61.75 65.75 69.13 107.6 113.1 118.1 124.1 128.3 137.2 
100 67.33 70.06 74.22 77.93 118.5 124.3 |129.6 135.8 140.2 149.4 
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Table 4: F-distribution, right tail 5% points. Table entry: b = Р ь,,005 where 
i FG vs)d PF, „2 = 0.05 with f(F,,.,) being the density of an F-variable having v, and v? 
degrees of freedom 


оо ND) tA BB) I н 


10 

11 2.40 
12 2.30 
13 2.21 
14 2.13 
15 2.07 
16 2.01 
17 1.96 
18 1.92 
19 1.88 
20 1.84 
21 1.81 


N 
N 


23 
24 1.73 
25 1.71 


(continued) 
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Table 4: (continued): F-distribution, right tail 5% points 


0211 4 5 6 7 8 10 12 24 оо 

26 2.98 2.74 2.59 2.47 239 2.32 2.22) 2.15 1.95 1.69 
27 2.900 2.73) 2.57 2.46 2.37 2.31 2.20 2.13 1.93 1.67 
28 2.900 2.71 2.56 2.45 2.36 229 2.19 2.12) 191 1.65 
29 2.93 270 2.55 2.43 235 2.28 2.18 2.10 1.90 1.64 
30 2.92) 2.69 2.53 2.42 2.33 2.27 2.16 2.09 1.89 1.62 
32 2.90 1.67 2.51 2.40 2.31 224 2.14 2.07 1.86 1.59 
34 2.88 2.65 2.49) 2.38 2.29 2.23 2.12) 2.05 1.84 1.57 
36 2.87 2.63 2.48 2.36 2.28 221 2.11) 2.03) 1.82) 1.55 
38 2.85 2.62 2.46 2.35 2.26 2.19 2.09 2.02) 1.81 1.53 
40 2.84 2.61 2.45 2.34 2.25 2.18 2.08 2.00 1.79 1.51 
60 2.76 2.53) 2.37 2.25 217 210 1.99 192 170 1.89 
100 2.63 2.45 2.29) 2.18 2.09 2.02 1.91) 1.83 161 1.25 
оо 2.60 2.37 221 2.10 201 1.94 1.83 1.75 1.52 1.00 
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Table 5: F-distribution, right tail 1% points. Table entry: b = 


degrees of freedom 


Fy,,,vy,0.01 
line fF (Fov) ЯР оо = 0.01 with f(F,,.,) being the density of an F-variable having v; and v; 


where 


Ww: 2 3 4 5 6 7 8 10 12 |24 о 
v2 

1 4052 4999.5 5403 5625 5764 5859 5928 5981 6056 6106 6235 6366 
2 985 990 992 992 993 099.3: 9594 994 994 994 99.5 99.5 
з |341 308 295 28.7 262 279 277 27.5 27.2 27.1 |26.6 26.1 
4 [21.2 180 1167 |160 |15.5 15.2 15.0 14.8 145 144 13.9 | 13.5 
5 16.26 13.27 | 12.06 11.39 10.97 10.67 | 10.46 10.29 10.05 9.89 9.47 9.02 
6 13.74 10.92 9.78 915 875 847 826 810 7.87 772 |7.31 6.88 
7 12.25 |9.55 845 7.85 746 719 6.99 6.84 6.62 6.47 |6.07 |5.65 
8 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.81 5.67 5.28 4.86 
9 10.56 8.02 699 6.42 6.06 5.80 5.61 5.47 5.26 5.11 473 431 
10 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.85 471 4.33 3.91 
11 0.65 721 6.22 5.67 5.32 5.07 4.89 4.74 4.54 4.40 4.02 3.60 
12 9.33 6.93 5.95 5.41 5.06 482 4.64 4.50 4.30 4.16 3.78 3.36 
13 (9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.10 3.96 3.59 3.17 
14 8.86 6.51 5.56 5.04 4.70 4.46 4.28 4.14 3.94 3.80 3.43 3.00 
15 8.68 6.36 5.42 4.89 4.56 432 4.14 4.00 3.80 3.67 3.29 2.87 
16 |8.53 623 529 477 4.44 420 |4.03 3.89 |3.69 3.55 |3.18 |2.75 
17 |840 611 5.18 14.67 4.34 410 |3.93 3.79 |3.59 3.46 |3.08 2.65 
18 820 601 5.09 14.58 4.25 401 |3.84 3.71 |3.51 |3.37 |3.00 |2.57 
19 818 593 501 4.50 4.17 |3.94 3.77 |3.63 |3.43 13.30 2.92 2.49 
20 810 |5.85 494 443 410 3.87 13.70 3.56 |3.37 |3.23 2.86 2.42 
21 |8.02 578 487 437 404 3.81 3.64 3.51 331 3.17 2.80 2.36 
22 DEOS. 572 482 431 |3999 376 3.59 345 326 312 2.75 |2.31 
23 788 566 (14.76 426 394 371 3.54 341 |3.21 3.07 2.70 2.26 
24 |7.82 561 14.72 422 |3.90 3.67 3.50 3.36 317 |3.03 2.66 2.21 
25 |7.77 |5.57 |4.68 418 3.86 3.63 346 332 313 2.99 2.62 2.17 
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Table 5: (continued): F-distribution, right tail 1% points 
v: |1 2 3 4 5 6 7 8 10 12 24 оо 
26 |7.72 |5.53 4.64 14.14 3.82 3.59 (3.42 3.29 3.09 12.96 2.58 213 
27 7.68 5.49 4.60 411 3.78 3.56 3.39 |3.26 3.06 12.93 12.55 210 
28 7.64 |5.45 4.57 407 |3.75 3.53 3.36 |3.23 3.03 2.90 2.52 2.06 
29 7.60 5.42 14.54 14.04 3.73 3.50 13.33 13.20 3.00 2.87 2.49 2.03 
30 17.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 2.98 2.84 2.47 2.01 
32 17.50 5.34 446 3.97 3.65 3.43 3.26 3.13 12.93 2.80 2.42 1.96 
34 7.45 5.29 442 3.93 3.61 3.39 3.22 3.09 2.90 2.76 412.38 1.91 
36 17.40 5.25 4.38 3.89 3.58 3.35 3.18 3.05 2.86 2.72 2.35 1.87 
38 7.35 15.21 434 3.86 3.54 3.32 3.15 3.02 2.83 2.69 2.32 1.84 
40 7.31 5.18 431 3.83 3.51 3.229 |3.12 2.99 2.80 2,66 (2.29 1.80 
60 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.63 2.50 2.12 | 1.60 
120 6.85 14.79 3.95 3.48 13.17 2.96 2.79 2.66 247 2.334 1.95 1.38 
оо 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.32 2.18 1.79 1.00 
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Table 6: Testing independence. Let H, : X is diagonal in a N,(j, X) population and и = 2?/" 
where n is the sample size and л refers to the A-criterion. The table entries are the 5th and 1st upper 
percentage points of ш = —[m + (2p + 5)/6]In u where m = n — 1. Н, is rejected for large values 
of w. The last line wherein m = оо displays the asymptotic chisquare values as w > X p-0/2 


596 points 

m poc. |р=4 p=5 р=6 |р=7 pHs. p ipe 
3 8.020 

4 7.834 15.22 

5 7.814 13.47 24.01 

6 7.811 13.03 20.44 34.30 

7 7.811 12.85 19.45 28.75 46.05 

8 7.811 12.76 19.02 27.11 38.41 59.25 

9 7.812 12.71 18.80 26.37 36.03 49.42 73.79 

10 7.812 12.68 18.67 25.96 34.91 46.22 61.76 89.92 
11 7.813 12.66 18.58 25.71 34.28 44.67 57.68 75.45 
12 |7.813 12.65 18.52  |25.55 33.89 143.78 55.65 70.43 
13 7.813 12.64 18.48 25.44 33.63 43.21 54.46 67.87 
14 7.813 12.63 18.45 25.36 33.44 4282 53.69 66.34 
15 7.814 12.62 18.43 25.30 33.31 42.55 53.15 65.33 
16 7.814 12.62 18.41 25.25 33.20 4234 5277 64.63 


17 7.814 12.62 18.40 |2521 33.12 4219 52.48 64.12 
18 7.814 12.61 18.38 25.19 33.06 42.06 52.26 63.73 
19 7.814 12.61 18.37 |2516 33.01 41.97 52.08 63.43 
20 7.814 12.61 18.37  |25.14 |32.97 41.89 51.94 63.19 
oo 7.815 12.59 18.31 25.00 (32.67 41.34 51.00 61.66 
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Table 6: Testing Independence 


1% points 

m pes |р=4 p=5 р=6 |р=7 |p=s |р=9 |р=10 
3 1179 

4 1141 21.18 

5 11.36 18.27 32.16 

6 11.34 17.54 26.50 44.65 

7 11.34 17.24 24.95 36.09 58.61 

8 11.34 17.10 24.29 33.63 47.05 74.01 

9 11.34 17.01 23.95 32.54 43.59 59.36 90.87 

10 11.34 16.96 23.75 31.95 42.00 54.83 73.03 109.53 
11 11.34 16.93 23.62 31.60 (41.13 52.70 67.37 88.05 
12 11.34 16.90 23.53 31.36 40.59 51.49 64.64 81.20 
13  |11.34 16.89 23.47 31.20 =| 40.23 50.73 63.06 77.83 
14 11.34 16.87 23.42 31.09 39.97 50.22 62.05 75.84 
15 11.34 16.86 23.39 31.00 39.79 49.85 61.36 74.56 
16 11.34 16.86 23.36 30.94 39.65 49.59 60.86 73.66 


17 |1134 16.85 23.34 |30.88 39.54 49.38 60.49 73.01 
18 11.34 16.85 23.32 30.84 39.46 149.22 60.21 72.52 
19 11.34 16.84 23.31 30.81 39.39 49.09 59.99 72.15 
20 |1134 16.84 23.30 30.78 39.33 48.99 59.81 71.85 
оо |1134 16.81 23.21 30.58 38.93 148.28 58.57 69.92 


Note: For p = 2, ш = 1 — г? where r is the sample correlation coefficient. Moreover, 
er ur ^ Fim where m = n — 1 and n is the sample size; also refer to Sect. 5.6 and 
Eq. (6.6.1). Thus, the case p = 2 is omitted from the tables as one can then readily obtain 


the requisite critical values from F or Student-: tables in light of these properties. 
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Table 7: Testing the equality of the diagonal elements, given that X in a №, (и, X) population 
is diagonal. Let v = A?/" where А. is the likelihood ratio test statistic and n is the sample size. 
Letting f(v) denote the density of v, the table entries w are the critical values of v such that 
F(w) = f f (v) dv = 0.01, 0.02, 0.025, 0.05, for various values of p and m = n — 1 


p=2 F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 
m w w w w 

2 0.01990 0.03960 0.04937 0.09750 
3 0.08083 0.12702 0.14675 0.22852 
4 0.15874 0.22173 0.24664 0.34163 
5 0.23520 0.30632 0.33318 0.43074 
6 0.30387 0.37792 0.40505 0.50053 
7 0.36370 0.43784 0.46442 0.55593 
8 0.41540 0.48812 0.51378 0.60071 
9 0.46009 0.53064 0.55524 0.63751 
10 0.49889 0.56694 0.59043 0.66824 
11 0.53279 0.59822 0.62062 0.69425 
12 0.56258 0.62540 0.64677 0.71654 
13 0.58893 0.64921 0.66961 0.73583 
14 0.61238 0.67024 0.68972 0.75268 
15 0.63336 0.68893 0.70756 0.76753 
16 0.65224 0.70564 0.72349 0.78072 
17 0.66930 0.72067 0.73778 0.79249 
18 0.68479 0.73425 0.75069 0.80307 
19 0.69891 0.74659 0.76239 0.81263 
20 0.71185 0.75784 0.77305 0.82131 
21 0.72372 0.76815 0.78280 0.82923 
22 0.73467 0.77761 0.79176 0.83647 
23 0.74479 0.78634 0.80000 0.84313 
24 0.75417 0.79442 0.80763 0.84927 
23 0.76290 0.80190 0.81469 0.85494 
26 0.77103 0.80887 0.82126 0.86021 
277 0.77862 0.81536 0.82738 0.86511 
28 0.78753 0.82143 0.83310 0.86967 
29 0.79240 0.82712 0.83845 0.87394 
30 0.79867 0.83245 0.84347 0.87794 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X in a N,(, X) 
Population is Diagonal 


p=3 F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 
m w w w w 

3 0.03171 0.05272 0.06214 0.10378 
4 0.07974 0.11611 0.13109 0.19128 
5 0.13649 0.18373 0.20222 0.27252 
6 0.19382 0.24775 0.26816 0.34301 
7 0.24796 0.30559 0.32688 0.40302 
8 0.29756 0.35690 0.37844 0.45403 
9 0.34240 0.40217 0.42357 0.49760 
10 0.38272 0.44212 0.46314 0.53509 
11 0.41895 0.47745 0.49798 0.56759 
12 0.45153 0.50885 0.52881 0.59599 
13 0.48092 0.53687 0.55624 0.62098 
14 0.50571 0.56199 0.58075 0.64314 
15 0.53164 0.58462 0.60278 0.66289 
16 0.55361 0.60510 0.62267 0.68061 
17 0.57369 0.62370 0.64071 0.69658 
18 0.59210 0.64066 0.65713 0.71105 
19 0.60903 0.65619 0.67214 0.72422 
20 0.62464 0.67046 0.68592 0.73625 
21 0.63908 0.68361 0.69860 0.74729 
22 0.65247 0.69577 0.71031 0.75744 
23 0.66492 0.70703 0.72115 0.76682 
24 0.67653 0.71750 0.73121 0.77550 
25 0.68737 0.72726 0.74059 0.78357 
26 0.69752 0.73637 0.74933 0.79108 
27 0.70704 0.74490 0.75751 0.79808 
28 0.71598 0.75290 0.76518 0.80464 
29 0.72440 0.76041 0.77238 0.81078 
30 0.73234 0.76749 0.77916 0.81655 


(continued) 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X ina №, (ш, X) 
Population is Diagonal 


p=4 F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 
m w w w w 

4 0.04504 0.06768 0.07726 0.11713 
5 0.08692 0.11996 0.13319 0.18503 
6 0.13350 0.17421 0.18994 0.24915 
7 0.18050 0.22641 0.24369 0.30699 
8 0.22568 0.27488 0.29305 0.35825 
9 0.26804 0.31915 0.33776 0.40346 
10 0.30724 0.35929 0.37801 0.44332 
11 0.34328 0.39559 0.41423 0.47858 
12 0.37632 0.42843 0.44684 0.50991 
13 0.40659 0.45818 0.47628 0.53786 
14 0.43435 0.48519 0.50294 0.56292 
15 0.45983 0.50980 0.52715 0.58549 
16 0.48328 0.53227 0.54921 0.60591 
17 0.50490 0.55286 0.56938 0.62447 
18 0.52487 0.57178 0.58768 0.64139 
19 0.54336 0.58921 0.60490 0.65688 
20 0.56052 0.60532 0.62061 0.67111 
21 0.57649 0.62025 0.63514 0.68423 
22 0.59137 0.63411 0.64863 0.69635 
23 0.60527 0.64702 0.66117 0.70759 
24 0.61828 0.65906 0.67286 0.71804 
25 0.63048 0.67032 0.68378 0.72777 
26 0.64194 0.68088 0.69401 0.73686 
27 0.65273 0.69079 0.70360 0.74537 
28 0.66289 0.70011 0.71262 0.75334 
29 0.67249 0.70889 0.72111 0.76084 
30 0.68156 0.71715 0.72912 0.76790 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X ina №, (ш, X) 
Population is Diagonal 


pHs F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 
m w w w w 

5 0.05774 0.08138 0.09103 0.12970 
6 0.09523 0.12645 0.13871 0.18581 
7 0.13538 0.17235 0.18648 0.23915 
8 0.17568 0.21677 0.23215 0.28827 
9 0.21473 0.25863 0.27481 0.33284 
10 0.25181 0.29753 0.31416 0.37304 
11 0.28661 0.33340 0.35026 0.40926 
12 0.31908 0.36639 0.38328 0.44191 
13 0.34925 0.39668 0.41350 0.47142 
14 0.37725 0.42451 0.44117 0.49816 
15 0.40323 0.45010 0.46654 0.52246 
16 0.42735 0.47369 0.48986 0.54462 
17 0.44976 0.49546 0.51134 0.56489 
18 0.47061 0.51560 0.53117 0.58350 
19 0.49004 0.53426 0.54952 0.60062 
20 0.50818 0.55160 0.56655 0.61643 
21 0.52513 0.56774 0.58237 0.63107 
22 0.54101 0.58280 0.59712 0.64465 
23 0.55590 0.59688 0.61089 0.65728 
24 0.56989 0.61006 0.62377 0.66907 
25 0.58305 0.62243 0.63584 0.68008 
26 0.59546 0.63406 0.64718 0.69039 
27 0.60717 0.64501 0.65784 0.70007 
28 0.61824 0.65533 0.66790 0.70917 
29 0.62872 0.66508 0.67738 0.71773 
30 0.63864 0.67430 0.68635 0.72582 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X in a №, (и, X) 
Population is Diagonal 


р=6б Е(ш) = 0.01 Е(ш) = 0.02 Е(ш) = 0.025 Е(ш) = 0.05 
т w w w w 

6 0.06938 0.09356 0.10318 0.14077 
7 0.10338 0.13335 0.14494 0.18882 
8 0.13892 0.17338 0.18643 0.23468 
9 0.17443 0.21223 0.22631 0.27744 
10 0.20897 0.24917 0.26394 0.31684 
11 0.24203 0.28388 0.29909 0.35294 
12 0.27336 0.31628 0.33174 0.38595 
13 0.30287 0.34641 0.36198 0.41614 
14 0.33057 0.37440 0.38997 0.44376 
15 0.35653 0.40038 0.41587 0.46908 
16 0.38082 0.42451 0.43987 0.49235 
17 0.40357 0.44694 0.46213 0.51377 
18 0.42487 0.46782 0.48280 0.53355 
19 0.44484 0.48729 0.50204 0.55184 
20 0.46357 0.50546 0.51997 0.56881 
21 0.48117 0.52245 0.53672 0.58458 
22 0.49772 0.53837 0.55238 0.59927 
23 0.51330 0.55330 0.56706 0.61298 
24 0.52799 0.56734 0.58084 0.62581 
25 0.54186 0.58054 0.59379 0.63784 
26 0.55497 0.59299 0.60599 0.64913 
27 0.56738 0.60474 0.61750 0.65975 
28 0.57914 0.61585 0.72837 0.66976 
29 0.59030 0.62637 0.63865 0.67920 
30 0.60089 0.63634 0.64899 0.68813 


(continued) 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that 27 in a №, (и, 27) 
Population is Diagonal 


р=7 Е(ш) = 0.01 Е(ш) = 0.025 Е(ш) = 0.05 
т w w w 

7 0.07993 0.11387 0.15046 
8 0.11104 0.15112 0.19258 
9 0.14306 0.18792 0.23290 
10 0.17492 0.22340 0.27081 
11 0.20598 0.25712 0.30613 
12 0.23587 0.28890 0.33886 
13 0.26439 0.31868 0.36912 
14 0.29144 0.34653 0.39708 
15 0.31703 0.37253 0.42293 
16 0.34117 0.39680 0.44685 
17 0.36394 0.41946 0.46901 
18 0.38539 0.44063 0.48957 
19 0.40562 0.46043 0.50870 
20 0.42468 0.47898 0.52651 
21 0.44267 0.49637 0.54313 
22 0.45966 0.51270 0.55867 
23 0.47572 0.52805 0.57323 
24 0.49091 0.54251 0.58688 
25 0.50529 0.55615 0.59972 
26 0.51892 0.56902 0.61180 
27 0.53186 0.58120 0.62319 
28 0.54415 0.59272 0.63394 
29 0.55584 0.60365 0.64412 
30 0.56697 0.61402 0.65375 


(continued) 
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Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X in a №, (и, X) 
Population is Diagonal 


р= 8 Е(ш) = 0.01 Е(ш) = 0.02 Е(ш) = 0.025 Е(ш) = 0.05 
т ш w w w 

8 0.08947 0.11391 0.12334 0.15897 
9 0.11816 0.14632 0.15699 0.19653 
10 0.14735 0.17850 0.19012 0.23256 
11 0.17632 0.20980 0.22215 0.26665 
12 0.20460 0.23985 0.25274 0.29867 
13 0.23192 0.26849 0.28175 0.32859 
14 0.25810 0.29563 0.30913 0.35650 
15 0.28309 0.32126 0.33492 0.38250 
16 0.30686 0.34544 0.35916 0.40672 
17 0.32942 0.36820 0.38194 0.42931 
18 0.35081 0.38965 0.40335 0.45038 
19 0.37108 0.40985 0.42348 0.47006 
20 0.39028 0.42878 0.44241 0.48848 
p=9 F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 
m w w w w 

9 0.09813 0.12249 0.13178 0.16652 
10 0.12474 0.15218 0.16250 0.20044 
11 0.15160 0.18155 0.19267 0.23304 
12 0.17821 0.21015 0.22189 0.26404 
13 0.20421 0.23770 0.24990 0.29332 
14 0.22939 0.26406 0.27660 0.32088 
15 0.25362 0.28917 0.30195 0.34676 
16 0.27685 0.31302 0.32596 0.37104 
17 0.29904 0.33563 0.34865 0.39380 
18 0.32020 0.35704 0.37010 0.41515 
19 0.34036 0.37731 0.39036 0.43619 
20 0.35955 0.39650 0.40950 0.45401 


(continued) 


Tables of Percentage Points 905 


Table 7: (continued): Testing the Equality of the Diagonal Elements, Given that X in a №, (џи, X) 
Population is Diagonal 


p=10 F(w) = 0.01 F(w) = 0.02 F(w) = 0.025 F(w) = 0.05 


m w w w w 
10 0.10602 0.13021 0.13936 0.17326 
11 0.13083 0.15762 0.16763 0.20420 
12 0.15574 0.18467 0.19536 0.23399 
13 0.18036 0.21101 0.22224 0.26243 
14 0.20445 0.23646 0.24810 0.28943 
15 0.22782 0.26090 0.27285 0.31498 
16 0.25039 0.28428 0.29645 0.33910 
17 0.27209 0.30658 0.31890 0.36184 
18 0.29291 0.32781 0.34022 0.38329 
19 0.31284 0.34800 0.36047 0.40351 
20 0.33189 0.36721 0.37968 0.42259 


Note: For large sample size n, we may utilize the approximation —2 In А ~ у з for the 


null distribution of A, where А = v"? 


is the lambda criterion, n is the sample size and 
du denotes a central chisquare random variable having p — 1 degrees of freedom. The 
null hypothesis will be rejected for large values of —2 In А since it is rejected for small 


values of the test statistics À or v. 


Some Additional Reading Materials 


Books 


Adachi, Kohei (2016): Matrix-based Introduction to Multivariate Data Analysis. Singa- 
pore, Springer. https://doi.org/10.1007/978-98 1- 10-2341-5 


Agresti, Alan (2013): Categorical Data Analysis (3rd ed.). Hoboken, Wiley-Interscience. 


Anderson, Theodore W. (2003): An Introduction to Multivariate Statistical Analysis, 3rd 
edition. New York, Wiley-Interscience. 


Atkinson, Anthony C., Riani, Marco, and Cerioli, Andrea (2004): Exploring Multivariate 
Data with the Forward Search, New York, Springer. 


Bartholomew, David J. (2008): Analysis of Multivariate Social Science Data (2nd ed.). 
Boca Raton, CRC Press. 


Bishop, Yvonne M., Fienberg, Stephen E., and Holland, Paul W. (2007): Discrete Multi- 
variate Analysis: Theory and Practice (1st ed.). New York, Springer. https://doi.org/10. 
1007/978-0-387-72806-3 


Brown, Bruce L. (2012): Multivariate Analysis for the Biobehavioral and Social Sciences: 
a Graphical Approach. Hoboken, Wiley. 


Chatfield, Chris, and Collins, A. J. (2018): Applied Multivariate Analysis : Using Bayesian 
And Frequentist Methods Of Inference. Boca Raton, Chapman & Hall/CRC. 


Cox, David R., and Wermuth, Nanny (1996): Multivariate Dependencies: Models, Analy- 
sis and Interpretation. London, Chapman and Hall. 


Cox, Trevor F. (2005): An Introduction to Multivariate Data Analysis. London, Hodder 
Arnold. 


Dugard, P. , Todman, J., and Staines, H. (2010). Approaching Multivariate Analysis, a 
Practical Introduction, 2nd edn. London, Routledge. 


© The Author(s) 2022 907 
A. M. Mathai et al., Multivariate Statistical Analysis in the Real and Complex Domains, 
https://doi.org/10.1007/978-3-030-95864-0 


908 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Everitt, Brian (2011): Cluster Analysis, (Sth ed.). Hoboken, Wiley. https://doi.org/10.1002/ 
9780470977811 


Everitt, Brian, and Dunn, Graham (2001): Applied Multivariate Data Analysis (2nd ed.). 
London, Arnold. 


Everitt, Brian., and Hothorn, Torsten. (2011): An Introduction to Applied Multivariate 
Analysis with К. New York, Springer. https://doi.org/10.1007/978- 1-4419-9650-3 

Flury, Bernhard (1997): A First Course in Multivariate Statistics. New York, Springer. 

Fujikoshi, Yasunori, Ulyanov, Vladimir V., and Shimizu, Ryoichi. (2010): Multivariate 
Statistics: High-dimensional and Large-sample Approximations. Hoboken, Wiley. 

Green, Paul E. (2014): Mathematical Tools for Applied Multivariate Analysis. Academic 
Press. 

Gupta, Arjun K., and Nagar, Daya K. (1999): Matrix Variate Distributions. Boca Raton, 
Chapman & Hall/ CRC. 

Hair, Joseph Е (2006): Multivariate Data Analysis (6th ed.). Upper Saddle River, Pear- 
son/Prentice Hall. 

Hardle, Wolfgang K., and Simar, Léopold (2015): Applied Multivariate Statistical Anal- 
ysis, 4th edition. Berlin, Springer. 

Hougaard, Philip (2000): Analysis of Multivariate Survival Data. New York, Springer. 


Huberty, Carl J., and Olejnik, Stephen (2006): Applied MANOVA and Discriminant Anal- 
ysis. (2nd ed.). Hoboken, Wiley-Interscience. https://doi.org/10.1002/047178947X 


Izenman, Alan J. (2008): Modern Multivariate Statistical Techniques: Regression, Classi- 
fication, and Manifold Learning (1st ed.). New York, Springer. https://doi.org/10.1007/ 
978-0-387-78189-1 

Johnson, Mark E. (1987): Multivariate Statistical Simulation. New York, Wiley. 


Johnson, Richard A. and Wichern, Dean W. (2007): Applied Multivariate Statistical Anal- 
ysis, 6th edition. Prentice Hall, New Jersey. 


Kollo, Tónu, and von Rosen, Dietrich (2005): Advanced Multivariate Statistics with Ma- 
trices. Dordrecht, Springer. 


Kotz, Samuel., Balakrishnan, N., and Johnson, Norman, L. (2000): Continuous Multivari- 
ate Distributions. Vol. 1, Models and Applications (2nd ed.). New York, Wiley. https:// 
doi.org/10.1002/0471722065 


Additional References 909 


Krzanowski, W. J. (2000): Principles of Multivariate Analysis: a User’s Perspective (Rev. 
ed.). Oxford, Oxford University Press. 


Manly, Bryan F. J., and Navarro Alberto, Jorge A. (2016): Multivariate Statistical Methods: 
A Primer, Fourth Edition. Boca Raton, Chapman & Hall/CRC Press. 


Meyers, Lawrence, Gamst, Glenn, and Guarino, A. J. (2016): Applied Multivariate Re- 
search: Design and Interpretation, 3rd edition. Sage Publications. 


Press, S. James (2005): Applied Multivariate Analysis : Using Bayesian And Frequentist 
Methods of Inference, 3rd edition. Dover Publications. 


Olive, David J. (2017): Robust Multivariate Analysis (1st ed.). Cham: Springer Interna- 
tional Publishing. https://doi.org/10.1007/978-3-319-68253-2 


Raykov, Tenko, and Marcoulides, George A. (2008): An Introduction to Applied Multi- 
variate Analysis. Abington, Taylor and Francis. 


Rencher, Alvin C., and Christensen, William F. (2012): Methods of Multivariate Analysis. 
New York, Wiley. 


Schafer, Joseph L. (1997): Analysis of Incomplete Multivariate Data (1st ed.). London, 
Chapman and Hall. 


Scott, David. W. (2014): Multivariate Density Estimation: Theory, Practice, and Visual- 
ization (Second edition). Hoboken, Wiley. 


Serdobolskii, Vadim. (2000): Multivariate Statistical Analysis: a High-dimensional Ap- 
proach. Dordrecht, Kluwer Academic Publishers. 


Spencer, Neil H.(2013): Essentials of Multivariate Data Analysis. Boca Raton, Chapman 
& Hall/CRC Press. 


Srivastava, M. S., and Khatri, C. С. (1979): An Introduction to Multivariate Statistics. New 
York, North-Holland. 


Tabachnick, Barbara G., Fidell, Linda S., and Ullman, Jodie B. (2019): Using Multivariate 
Statistics (Seventh edition). Boston, Pearson. 


Timm, Neil H. (2007): Applied Multivariate Analysis. New York, Springer-Verlag. 


Wadsack, Peter, and Kres, Heinz. (2012): Statistical Tables for Multivariate Analysis: A 
Handbook with References to Applications. Dordrecht, Springer. 


Wei, William W. S. (2019): Multivariate Time Series Analysis and Applications. Hoboken, 
Wiley. 


910 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Some Recent Research Papers 


Habti Abeida and Jean-Pierre Delmas (2020): Efficiency of subspace-based estimators for 
elliptical symmetric distributions. Signal Processing, 174, 107644 (9 pages) 


Olivier Besson (2021): On the distributions of some statistics related to adaptive filters 
trained with t-distributed samples. arXiv:2101.10609v2 [math.ST] 3 March 2021 


Taras Bodnar and Yarema Okhrin (2008): Properties of the singular, inverse and gener- 
alized inverse partitioned Wishart distributions. Journal of Multivariate Analysis, 99, 
2389-2405. 


John T. Chen and Arjun K. Gupta (2005): Matrix variate skew normal distributions. Statis- 
tics, 39(3), 247—253. https://doi.org/10.1080/02331880500108593 


Marco Chiani (2014): Distribution of the largest eigenvalue for real Wishart and Gaussian 
random matrices and a simple approximation fo the Tracy-Widom distribution. Journal 
of Multivariate Analysis, 129, 69-81. 


A. W. Davis (1972): On the marginal distributions of the latent roots of the multivariate 
beta matrix, The Annals of Mathematical Statistics, 43(5), 1664—1670. 


Michel Denuit and Yang Lu (2021): Wishart-gamma random effects models with applica- 
tions to nonlife insurance. The Journal of Risk and Insurance, 88(2), 443—481. https:// 
doi.org/10.1111/jori.12327 


José A. Diaz-Garcia and Ramón Gutiérrez Jáimez (2007): Singular matrix variate beta 
distribution. Multivariate Analysis, 99, 637—648. 


José A. Diaz-Garcia and Ramón Gutiérrez Jáimez (2009): Doubly singular matrix variate 
beta type I and П and singular inverted matrix-variate t distributions. arX1v:0904.2147v1 
[math.ST] 14 April 2009. 


Jórn Diedrichsen, Serge B. Provost and Hossein Zareamoghaddam (2021): On the distri- 
bution of cross-validated Mahalanobis distances. arXiv: 1607.01371 


Gilles R. Ducharme, Pierre Lafaye de Micheaux and Bastien Marchina (2016): The com- 
plex multinormal distribution, quadratic forms in complex random vectors and an om- 
nibus goodness-of-fit test for the complex normal distribution. Annals of the Institute of 
Statistical Mathematics, 68, 77—104. 


Alan Edelman (1991): The distribution and moments of the smallest eigenvalue of a ran- 
dom matrix of Wishart type. Linear Algebra and its Applications, 159, 55—80. 


Additional References 911 


Junhan Fang and Grace Y. Yi (2022): Regularized matrix-variate logistic regression with 
response subject to misclassification. Journal of Statistical Planning and Inference, 217, 
106-121. https://doi.org/10.1016/j.jspi.2021.07.001 


Daniel R. Fuhrmann (1999): Complex random variables and stochastic processes. Sec- 
tion 60 of the book Digital Signal Processing Handbook, CRC Press LLC. 


Nina Singhal Hinrichs and Vijay S. Pande (2007): Calculation of the distribution of eigen- 
values and eigenvectors in Markovian state models for molecular dynamics. The Journal 
of Chemical Physics, 126, 244101 (11 pages). 


Velimir Illic, Jan Korbel, Shamik Gupta and A. M. Scarfone (2021): An overview of gen- 
eralized entropic forms. arXiv:2102.10071v1[cond-mat.stat-mech], 19 Feb 2021. 


Anis Iranmanesh, M. Arashi, D.K. Nagar and S.M.M. Tabatabaey (2013): On inverted 
matrix variate gamma distribution. Communications in Statistics - Theory and Methods, 
42, 28-31. 


Oliver James and Heung-No Lee (2021): Concise probability distributions of eigenvalues 
of real-valued Wishart matrices. arXiv.org/ftp/arXov/papers/1402/1402.6757.pdf 


Romuald A. Janik and Maciej A. Nowak (2003): Wishart and anti- Wishart random matri- 
ces. Journal of Physics A: Mathematical and General, 36(12), 3629. 


Ian M. Johnstone (2001): On the distribution of the largest eigenvalue in principal compo- 
nents analysis. The Annals of Statistics, 29(2), 295—327 


Tatsuya Kubokawa and Muni S. Srivastava (2008): Estimation of the precision matrix of 
singular Wishart distribution and its application in high-dimensional data. Journal of 
Multivariate Analysis, 99, 1906—1928. 


Nojun Kwak (2008): Principal component analysis based on L1-norm maximization. IEEE 
Transaction on Pattern Analysis and Machine Intelligence, 30(9), 1672—1680. 


Arak М. Mathai and Hans J. Haubold (2020): Mathematical aspects of Krátzel integral 
and Kratzel transform. Mathematics, 8, 526. https://doi.org/10.3390/math8040526 


Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Narihiro Takamune, Hiroshi 
Saruwatari and Nobutaka Ono (2017): Independent low-rank matrix analysis based on 
complex Student-t distribution for blind audio source separation. arXiv:1708.04795v1 
[cs.SD] 16 Aug 2017 


912 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Daya К. Nagar, Alejandro Roldán-Correa and Arjun K.Gupta (2013). Extended matrix 
variate gamma and beta functions. Journal of Multivariate Analysis, 122, 53—69. 


Felping Nie and Heng Huang (2016): Non-greedy L21-norm maximization for principal 
component analysis. arXiv: 1603.08293v1[cs,LG] 28 March 2016. 


Vladimir Ostashev, D. Keith Wilson, Matthew J. Kamrath, M. J. and Chris L. Pettit, 
(2020): Modeling of signals scattered by turbulence on a sensor array using the ma- 
trix gamma distribution. The Journal of the Acoustical Society of America, 148, 2476. 
https://doi-org.proxy 1 .lib.uwo.ca/10.1121/1.5146860 


Victor Pérez-Abreu and Robert Stelzer (2014): Infinitely divisible multivariate and matrix 
Gamma distributions. Journal of Multivariate Analysis, 130, 155—175. 


Serge B. Provost and John N. Haddad (2019): A recursive approach for determining matrix 
inverses as applied to causal time series processes, Metron, 77, 53-62. 


Tharm Ratnarajah (2005): Complex singular Wishart matrices and multiple-antenna 
systems, IEEE/SP 13th Workshop on Statistical Signal Processing, 1032—1037. doi: 
10.1109/SSP.2005.1628747 


Tharm Ratnarajah and R. Vaillancourt (2005): Complex singular Wishart matrices and 
applications. Computers and Mathematics with Applications, 50, 399-411. 


Maria Ribeiro, Teresa Henriques, Luisa Castro, André Souto, Luis Antunes, Cristina 
Costa-Santos and Andreia Teixeira (2021): The entropy universe. Entropy, 23, 222 (35 
pages), https://doi.org/10.3390/e232020222 


Yo Sheena, A. K. Gupta and Y. Fujikoshi (2004): Estimation of the eigenvalues of noncen- 
trality parameter in matrix variate noncentral beta distribution. Annals of the Institute 
of Statistical Mathematics, 56(1), 101—125. 


Jiarong Shi, Xiuyun Zheng and Wei Yang (2017): Survey on probabilistic models of low- 
rank matrix factorization. Entropy, 424. https://doi.org/10.3390/e19080424. 


Martin Singull and Timo Koski (2012): On the disribution of matrix quadratic forms. Com- 
munications in Statistics -Theory and Methods, 41(18), 3403—3415. https://doi.org/10. 
1080/03610926.201 1.563009. 


Muni S. Srivastava (2003): Singular Wishart and multivariate beta distributions. The An- 
nals of Statistics, 31(5), 1537—1560. 


Additional References 913 


Tim Wirtz, G. Akemann, Thomas Guhr, M. Kieburg and R. Wegner (2015): The smallest 
eigenvalue distribution in the real Wishart-Laguerre ensemble with even topology. Jour- 
nal of Physics A: Mathematical and Theoretical, 48. https://doi.org/10.1098/1751-8113/ 
48/24/245202. 


Soonyu Yu, Jaena Ryu and Kyoohong Park (2014): A derivation of anti-Wishart distribu- 
tion. Journal of Multivariate Analysis, 131, 121—125. 


Alberto Zanella, Marco Chiani and Moe Z. Win (2009): On the marginal distribution of 
the eigenvalues of Wishart matrices. IEEE Transactions on Communications, 57, 1050— 
1060. 


Lorenzo Zaninetti (2020): New probability distributions in astrophysics: III. The trun- 
cated Maxwell-Boltzmann distributions. International Journal of Astronomy and Astro- 
physics, 10, 191—202. 


Lorenzo Zaninetti (2021): New probability distributions in astrophysics: V. The truncated 
Weibull distribution. International Journal of Astronomy and Astrophysics, 11, 133- 
149. 


Xingfu Zhang, Hyukjun Gweon and Serge B. Provost (2020): Threshold moving ap- 
proaches for addressing the class imbalance problem and their application to multi-label 
classification, pp. 72—77. ICAIP2020, https://doi.org/10.1145/3441250.3441274 


Author Index 


A 

Abeida, A., 910 

Adachi, K., 907 

Agresti, A., 907 

Akemann, G., 913 

Allman, E.S., 686 

Anderson, T.W., 457, 566, 685, 907 
Antunes, L., 912 

Arashi, D.K., 911 

Atkinson, A.C., 907 


B 

Balakrishnan, N., 510, 908 
Barnes, C.A., 495 
Bartholomew, D.J., 907 
Besson, O., 910 

Bishop, V.M., 907 
Bodnar, T., 910 

Brown, B.L., 907 


C 

Castro, L., 912 

Cerioli, A., 907 
Chatfield, C., 907 
Chattopadhyay, A.K., 627 
Chen, J.T., 910 

Chen, Y., 686 


© The Author(s) 2022 


A. M. Mathai et al., Multivariate Statistical Analysis in the Real and Complex Domains, 
https://doi.org/10.1007/978-3-030-95864-0 


Chiani, M., 627, 910, 913 
Christensen, W.F., 909 
Clayton, D.D., 495 
Clemm, D.S., 627 
Collins, A.J., 907 
Costa-Santos, C., 912 
Cox, D.R., 907 

Cox, T.F., 907 

Crichfield, C., 495 


D 

Davis, A.W., 435, 627, 910 
de Micheaux, P.L., 910 
Delmas, J.-P., 910 

Denuit, M., 910 
Diaz-Garcia, J.A., 910 
Diedrichsen, J., 910 
Ducharme, G., 910 


Dugard, P., 907 
Dunn, G., 908 
E 


Edelman, A., 627, 910 
Everitt, B., 908 


F 
Fang, J., 911 
Fidell, L.S., 909 


916 Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Field, J.B., 435 
Fienberg, S.E., 907 
Flury, B., 908 

Fowler, W.A., 495 
Fuhrmann, D.R., 911 
Fujikoshi, Y., 908, 912 


G 

Gamst, G., 909 

Green, Р.Е., 908 

Guarino, A.J., 909 

Сит, T., 913 

Gupta, A.K., 908, 910, 912 
Gupta, S., 911 

Gweon, H., 913 


H 

Habti, A., 913 

Haddad, J.N., 912 

Hair, J.F., 908 

Hárdle, W.K., 908 

Haubold, H.J., 1, 105, 108, 110, 113, 
125, 139, 357, 359, 360, 367, 369, 377, 
395, 469, 478, 495, 502, 510, 595, 599, 
611, 647, 652, 657, 759, 763, 771, 833, 
870, 911 

Henriques, T., 912 

Hinrichs, N.S., 911 

Holland, P.W., 907 

Hothorn, T., 908 

Hougaard, P., 908 

Huang, H., 620, 912 

Huberty, C.J., 908 


I 
Thara, M., 686 
Illic, V., 911 


Iranmanesh, A., 911 
Izenman, A.J., 908 


J 

Jáimez, R.G., 910 
James, A.T., 627 

James, O., 627, 911 
Janik, R.A., 911 
Johnson, M.E., 908 
Johnson, N.L., 908 
Johnson, R.A., 908 
Johnstone, I.M., 627,911 
Jorge, A., 909 


K 

Kamrath, M.J., 912 
Kano, Y., 686 

Katiyar, R.S., 449, 457 
Khatri, C.G., 627, 909 
Kieburg, M., 913 
Kitamura, D., 911 
Kollo, T., 908 

Kondo, T., 537, 538 
Korbel, J., 911 

Korin, B.P., 435 
Koski, T., 912 
Kotz,S., 908 

Kres, H., 909 
Krishnaiah, P.R., 627 
Krzanowski, W.J., 909 
Kubokawa, T., 911 
Kwak, N., 620, 911 


L 

Lai, C.D., 510 
Lee, H.-N., 911 
Lee, W.-N., 627 
Li, X., 686 

Lu, Y., 910 


M 
Manly, B.F.J., 909 


Author Index 


Marchina, B., 910 

Marcoulides, G.A., 909 

Marshall, A.W., 510 

Mathai, A.M., 1, 44, 77, 81, 105, 108, 
110, 113, 125, 139, 357, 359, 360, 367, 
369, 377, 395, 469, 478, 495, 502, 510, 
595, 599, 611, 647, 652, 657, 759, 763, 
771, 833, 870, 911 

Matias, C., 686 

Meyers, L., 909 

Mitsui, Y., 911 

Mogami, S., 911 


N 

Nagao, H., 435 

Nagar, D.K., 908, 911, 912 
Nagarsenker, B.N., 435 
Navarro, A., 909 

Nie, F., 912 

Nowak, M.A., 911 


O 

Okhrin, Y., 910 
Olejnik, S., 908 
Olive, D.J., 909 
Olivier, B., 910 
Olkin, I., 510 
Ono, N., 911 
Ostashev, V., 912 


Р 

Pais, A., 495 

Pande, V.S., 911 

Park, K., 913 
Pérez-Abreu, V., 912 
Pettit, C.L., 912 

Pillai, K.C.S., 435, 627 
Potthoff, R.F., 842 
Press, S.J., 909 


917 


Princy, T., 495 
Provost, S.B., 81, 170, 510, 533, 910, 
912, 913 


R 

Rathie, P.N., 201, 436, 443, 449, 457 
Ratnarajah, T., 912 
Raykov, T., 909 
Rencher, A.C., 909 
Rhodes, J.A., 709 
Riani, M., 907 

Ribeiro, M., 912 
Roldán-Correa, A., 912 
Roy, S.N., 842 

Rubin, H., 685 

Ryu, J., 913 


S 

Saruwatari, H., 908, 911, 912 
Saxena, R.K., 110, 113, 443, 452, 457, 
478, 771, 833 

Scarfone, A.M., 911 

Schafer, J.L., 909 

Schramm, D.N., 495 

Scott, D.W., 909 

Serdobolskii, V., 909 

Sheena, Y., 912 

Shi, J., 620, 912 

Shimizu, R., 908 

Simar, L., 908 

Singull, M., 912 

Souto, A., 912 

Spencer, N.H., 909 

Srivastava, M.S., 909, 911, 912 
Staines, H., 907 

Stelzer, R., 912 

Sugiura, N., 435 


918 


T 

Tabachnick, B.G., 909 
Tabatabaey, S.M.M., 911 
Takamune, N., 911 
Teixeira, A., 912 

Timm, N.H., 909 
Todman, J., 907 


U 
Ullman, J.B., 909 
Ulyanov, V.V., 908 


V 
Vaillancourt, R., 912 
von Rosen, D., 908 


W 

Wadsack, P., 909 
Waikar, V.B., 627 
Wegge, L.L., 686 


Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


Wegner, R., 913 

Wei, W.W.S., 909 
Wermuth, N., 907 
Wichern, D.W., 908 
Wilson, D.K., 912, 913 


Win, M.Z., 913 
Wirtz, T., 913 
Y 

Yang, W., 912 
Yi, G.Y., 911 
Yu, S., 913 

Z 


Zanella, A., 913 

Zaninetti, L., 913 
Zareamoghaddam, H., 910 
Zhang, S., 686 

Zhang, X., 913 

Zheng, X., 912 


Subject Index 


A 


Analysis of variance (ANOVA), 763 


Asymptotic distribution, 436 
Asymptotic normality, 203 


B 

Bayes’ rule, 714 
Bessel function, 334 
Beta variable, 72 


C 

Canonical correlation, 641 
Canonical correlation matrix, 660 
Cauchy density, 76 

Central limit theorem, 119 
Characteristic function, 212-214 
Chebyshev's inequality, 115 
Chisquare, 62, 65 
Chisquaredness, 78, 172 
Classification, 711—757 
Cofactor, 21 

Confidence interval, 125 
Consistency, 198 

Correlation coefficient, 355 
Cramér-Rao inequality, 201 
Critical region, 396 


© The Author(s) 2022 


D 

Design of experiment model, 681 
Determinant, 12 

Determinant, absolute value of, 43 
Differential, 40 

Dirichlet density, 378, 381 
Discriminant function, 727 
Duplication formula, 527 


E 

Eigenvalue, eigenvector, 32 
Elliptically contoured distribution, 206 
Estimation, point, 120 


F 

Factor analysis, 679 

Factor analysis model, 682 

Factor loadings, 687 

Factor space, 683 

Fractional integral, first kind, 371 
Fractional integral, second kind, 368 
F-variable, 60 


G 

Gamma function, multiplication formula 
of, 438 

Gaussian, multivariate, 129 

Generalized variance, 347 


919 


A. М. Mathai et al., Multivariate Statistical Analysis in the Real and Complex Domains, 


https://doi.org/10.1007/978-3-030-95864-0 


920 


Geometrical probability, 355 
G-function, 112, 329 
Growth curve, 824 


H 
Hazard function, 72 


Hermitian forms, independence of, 175 


H-function, 112 
Hotelling’s Т?, 415 


I 

Identification problem, 685 
Interval estimation, 124 
Invariance, 645 

Inverse Wishart density, 349 
Iterative procedure, 656—663 


J 
Jacobian, 43 


K 
Kratzel function, 334 


L 

11, L2 norms, 618—620 
Lambda criterion, 411 
Likelihood function, 396 
Linear hypothesis, 463—473 


M 
Matrix, 2 


in the complex domain, 35 


diagonal, identity, 4 
elementary, 24 


Hermitian, skew Hermitian, 35 


idempotent, nilpotent, 34 
inverse of, 23 
multiplication, 5 


null, square, rectangular, 4 


partitioned, 29 


Arak M. Mathai, Serge B. Provost, Hans J. Haubold 


scalar multiplication of, 4 

singular values of, 38 

symmetric, skew symmetric, 34 

transpose of, 8 

triangular, 4 
Matrix-variate gamma, complex, 291 
Matrix-variate gamma, real, 224 
Matrix-variate, rectangular, 493 
Maximum likelihood estimators (MLE), 
121 
Maxwell-Boltzmann distribution, 499 
M-convolution, 377 
Method of maximum likelihood, 120 
Method of moments, 120 
Misclassification, 712 
Mittag-Leffler function, 334 
Moment generating function (mgf), 57 
Multiple correlation coefficient, 364 
Multivariate analysis of variance 
(MANOVA), 481, 759 


N 
Null hypothesis, 396 


O 
Oblique factors, 686 
Orthogonal factors, 707 


Р 

Partial correlation, 364 
Pathway models, 370 
Patterned matrices, 487 
Polar coordinates, 500 
Poles, 322 

Power of a test, 397 
Power transformation, 71 
Principal components, 597 
Products and ratios of matrices, 366 
Profiles, coincident, 814 


Subject Index 


Profiles, level, 814 

Profiles, parallel, 814 
Pseudo Dirichlet model, 383 
p-value, 408 


Q 


Quadratic form, 78 


Quadratic forms, independence of, 78, 


169 


R 

Raleigh distribution, 494 
Rao-Blackwell theorem, 200 
Regression, 360 

Regression model, 680 
Relative efficiency, 199 
Reliability function, 72 
Repeated measures design, 824 
Residues, 322 


S 
Scalar quantity, 4 
Singular gamma, 580 


Singular gamma, complex, 588 
Stirling’s formula, 504 
Student-t, 73 

Sufficiency, 198 


T 
Triangularization, 340 
Type-1, type-2 error, 396 


U 
Unbiasedness, 196 


V 
Vector, row, column, 6 


Vectors, orthogonal, orthonormal, 10 


W 

Weak law of large numbers, 118 
Wedge product, 40 

Weyl integral, 375 

Wishart density, 335 


Z 
Zonal polynomial, 478 


921 


